March 21, 2022

Lab 7 Part 1 related preparation for the Midterm Exam

Review of general concepts and recent midterm exam questions

#### Links to original labs/exams

1. Lab 7 Part 1 the 3-element pipeline adder (slides 3-14)

ee457 lab7 P1 (dir) ee457 pipe 3elem adder Verilog.pdf

2. Q#1 from the midterm of Fall 2018 (slides 15-20)

ee457 MT Fall2018.pdf ee457 MT Fall2018 sol.pdf

3. Q#1 from the midterm of Fall 2019 (slides 21-22)

<u>ee457 MT Fall2019.pdf</u>

<u>ee457 MT Fall2019 sol.pdf</u>

4. Q#3 from the midterm of Spring 2020 (slides 23-26)
<a href="mailto:ee457">ee457</a> MT Spring2020.pdf

ee457 MT Spring2020 sol.pdf

5. Q#1 from the midterm of Fall 2020 (slides 27-39)

<u>ee457 MT Fall2020.pdf</u>

<u>ee457 MT Fall2020 sol.pdf</u>

6. Q#1 from the midterm of Spring 2021 (slides 40-44)

ee457 MT Sp2021.pdf ee457 MT Sp2021 sol.pdf

7. Q#1 from the midterm of Fall 2021 (slides 45-52)

ee457 MT Fall2021.pdf ee457 MT Fall2021 sol.pdf

### Lab 7 Part 1

### 3-element adder

#### Lab 7 Part 1 Block Diagram

#### 3-element adder

Seniors can only help from the WB stage.

Z-reg does not cause stalling.

X and/or Y registers cause stalling if they are dependent on their senior #1.

Priority in forwarding? Since only one senior can forward, there is no priority in forwarding.

Redundancy in forwarding: Since there is only one stage to receive forwarding help, there cannot be any redundant FMP (forwarding multiplexer pair).



- Complete all missing connections marked in dotted lines 
   Also complete the RA(Result Address) connection in ID stage (ID\_RA).
   Complete all five enable (EN) controls on the pipeline registers (including PC).
   Complete the forwarding paths into the four forwarding muxes.

- 4. Complete logic to inject bubble into the next stage if the current instruction is being stalled or being flushed.
- 5. Draw the logic on a separate page for generating STALL, X\_FORW1, Y\_FORW1, Z\_FORW1, Z\_FORW2.

**Pinelined 3-element Adder** Block Diagram LAB 7 Part1







Pinelined 3-element Adder Block Diagram LAB 7 Part1



# Comparison units 3 in IFRF + 6 Comp station in ID = 9 total





In some of the exams, we ask students to have Forwarding units in each of the EX stages, even through area wise and timing wise it is not a good idea.

Do not create GHOST copies of the stalled or the senior instruction when stalling a dependent junior! Inject a bubble!











**Bubble Injecting AND gate** 

HDU in our Lab 6 Part 4 (or any pipelined CPU) injects a bubble into the next (EX) stage, when it stalls an instruction in the ID stage.



Mux with a zeros input is to illustrate the Bubble injection concept | |



Actual implementation can be as simple as a bunch of AND gates (here one AND gate.

**ID RUN** 

# Do spurios stalls occur in the 5-stage CPU of Lab 6 Part 4 and/or this Lab 7 Part 1?



Stall logic in Lab 7 (all parts) will check to see that the ID\_RUN\_IN is true.





On power-on reset, you want to make sure that there are no RANDOM instructions in the pipeline.



Dual purpose of the WBFF in Lab 6 Part 4 Why don't we need a WBFF in Lab 7 Part 1?

Answer: No opcode here. Single-bit active-high RUN signal! So, no difficulty in converting the RANDOM instr in ID stage to a bubble. There is no branch instruction to flush the IF stage fall-through junior through wrist banding. Even if there is a branch instruction, we can inject a bubble into ID from the IF stage to flush the IF junior!

The reset signal (during power-on reset) asynchronously clears the RUN signal to replace the random instruction in the right 4 stages with a bubble.

The RESET signal also clears the PC so that the very first boot-up instruction at (PC=0) is fetched in IF stage.



### Q#1 Fall 2018 Midterm

### A 4-element adder

The 5-stage design:



Comparison units in the IFRF: \_\_\_\_\_

Comparison units in the comp station: \_\_\_\_\_

\_\_\_\_ used in stalling and \_\_\_\_ used for forwarding



Comparison units in the IFRF: \_\_\_\_\_\_\_

Comparison units in the comp station: \_\_\_\_\_\_\_\_

4 used in stalling and 4 used for forwarding

The 6-stage design:



Comparison units in the IFRF: \_\_\_\_\_

Comparison units in the comp station: \_\_\_\_\_

\_\_\_\_ used in stalling and \_\_\_\_ used for forwarding



Comparison units in the IFRF: \_4\_\_

Comparison units in the comp station: \_\_\_\_\_\_\_

5\_ used in stalling and 7\_ used for forwarding

#### The 5-stage and the 6-stage Each has its own advantage and disadvantage!

| If you want the 5-stage to look better, you write testbench with more instruction sequences such that |
|-------------------------------------------------------------------------------------------------------|
| If you want the 6-stage to look better, you write testbench with more instruction sequences such that |
|                                                                                                       |

If you want the 5-stage to look better, you write testbench with more instruction sequences such that either \$W or \$X is dependent on its senior #2. Then the 5-stage design does not stall but the 6-stage design would stall.

Basically 5-stage does better than the 6-stage for solving dependencies of SW and SX. SY is same. But for SZ, the 6-stage is better.

If you want the 6-stage to look better, you write testbench with more instruction sequences such that \$Z is dependent on its senior #1. Then the 6-stage design does not need to stall but the 5-stage design would stall.

Basically 5-stage does better than the 6-stage for solving dependencies of SW and SX. SY is same. But for SZ, the 6-stage is better.

### Q#1 Fall 2019 Midterm

A 4-element adder

One of the four is a constant 3

Do you want that to be the W or the X or the Y or the Z?

Ans: W or X to avoid as many stalls as possible

### W or X can be replaced by constant 3



# **Q#3 Spring 2020 MT**

The 3-element adder

With a Dummy Stage EX0



Cross out any unneeded muxes, comparison units, etc. and complete the remaining connections marked as

Pinelined 3-element Adder Block Diagram LAB 7 Part1 modified (EX0 added)





,









Note: Senior #1 is in EX1 in the original design. If the junior is dependent on his senior #1 for X or Y, he should be stalled for one clock.

Note: Senior #1 is in EX<mark>0</mark> in this new design. If the junior is dependent on his senior #1 for X or Y, he should be stalled for one clock.

### Q#1 Fall 2020 MT

The 3-element adder with a multiplier and a bypass around the 2nd adder



#### Q#1 Fall 2020 MT



| Instruction                  | Operation                      | Opcod | le   | 3  |
|------------------------------|--------------------------------|-------|------|----|
|                              |                                | MA M  | EA 1 | A2 |
| NOP                          |                                | 0 0   | 0    | 0  |
| <b>A2</b> \$R, \$X, \$Y;     | (\$R) <= (\$X) + (\$Y)         | 0 0   | 0    | 1  |
| <b>A3</b> \$R, \$X, \$Y, \$2 | (\$R) <= (\$X) + (\$Y) + (\$Z) | 0 0   | 1    | 0  |
| <b>M</b> \$R, \$X, \$Y;      | (\$R) <= (\$X) * (\$Y)         | 0 1   | . 0  | 0  |
| MA \$R, \$X, \$Y, \$2        | (\$R) <= (\$X) * (\$Y) + (\$Z) | 1 0   | 0    | 0  |
|                              |                                |       |      |    |



# Bypassing stage logic is used in Lab 7 Part 3

NOP

MOV R, X; (R) <= (X)

SUB3 R, X; R (R) <= X - 3

ADD4  $R, X; (R) \le (X) + 4$ 

ADD1  $R, X; (R) \le (X) + 1$ 









## On power-on reset, you want to make sure that there are no RANDOM instructions in the pipeline.



Dual purpose of the WBFF in Lab 6 Part 4 Why don't we need a WBFF in Lab 7 Part 1?

Answer: No opcode here. Single-bit active-high RUN signal! So, no difficulty in converting the RANDOM instr in ID stage to a bubble. There is no branch instruction to flush the IF stage fall-through junior through wrist banding. Even if there is a branch instruction, we can inject a bubble into ID from the IF stage to flush the IF junior!

\_\_\_\_\_(Like Unlike) the X and Y operands, which need two pairs of forwarding muxes in EX1, the Z operand \_\_\_\_\_ (needs doesn't need) a pair of forwarding muxes in EX1. Explain

Unlike the X and Y, the Z does not participate in any computation in the EX1 stage. So, unlike the X and Y, Z will not stall in the ID stage, even if his immediate senior (S#1) is a A3 or MA who cannot help from EX2. So, Z decides to

receive help from his senior #1 in the next clock, when Z is in EX2 and his senior #1 is in WB stage. Hence the second mux of Z (Z2Mux) is in EX2 unlike X2Mux and Y2 Mux which are in EX1.

EE457 MT - Fall 2020 2/14 (C) Copyright 2020 Gandhi Puvvada

EX2 EX1STALL M -Z1Mux L Z2Mux Y1Mux Y2Mux R1Mux Adder YD+ZD FZIM FZ2M Adder/Multiplier WB RD FY2M X2Mux X1Mux  $\overline{\mathbf{A}}/\mathbf{M}$ RA WB RA EX1 RA EX2 RA FX1M FX2M

Z has the luxury of waiting until it reaches EX2.



\_\_\_\_\_ (Like / Unlike) in Lab 7 P1, the comparison units used for stalling and the comparison units used for forwarding \_\_\_\_\_ (are / aren't) in disjoint groups

Consider different types of seniors: some agreeing to help in time and some asking you to stall.





#### Q#1 Fall 2020 MT



#### How about Lab 6 Par 4 early branch?

The pair of comparison units in the FU\_Br are also in HDU\_Br. A branch instr. in ID stage can get forwarding help from R-type senior #2 in MEM stage but he has to stall if that senior #2 in MEM is a LW instr. There are other such cases. Please see Lab 6 Part 5.



#### STALL\_D and STALL\_M



A stall in EX stage or beyond results in stalling the entire pipe!

DCM (Data Cache Miss) in the early or the late branch designs of the CPU.

Stall initiated by MULTIPLY operation in EX1 in this Q#1 of Fall 2020 MT.

If you do not stall the entire pipe, any forwarding help from the WB stage senior to the junior in EX stage would be lost:(

But the senior in WB has already helped the junior in EX stage, so why do we need to stall the entire pipe?

Well, combinational logic in EX stage cannot hold (store) the help received in a prior clock!

#### **Time-Space Diagram**

Complete the Time-Space diagram below. If there are excess rows in the table, just leave them blank. If there aren't enough rows to complete the given sequence, stop at the last row. Write "**Bubble**" or draw a bubble when needed.

| instr1: MA#1                                                 | MA \$4, \$3, \$ | 2, \$1      | CC#  | IF   | ID   | EX1    | EX2    | WB     |
|--------------------------------------------------------------|-----------------|-------------|------|------|------|--------|--------|--------|
| instr2: M#2                                                  | M \$4, \$4, \$  | 4           | CC1  | MA#1 |      |        |        |        |
| instr3: A3#3                                                 | A3 \$7, \$6, \$ | 5, \$4      | CC2  | M#2  | MA#1 |        |        |        |
| instr4: A2#4                                                 | A2 \$8, \$7, \$ | 4           | CC3  | A3#3 | M#2  | MA#1   |        |        |
| instr5: M#5                                                  | M \$4, \$8, \$  | 7           | CC4  | A3#3 | M#2  | MA#1   |        |        |
|                                                              |                 |             | CC5  | A3#3 | M#2  | Bubble | MA#1   |        |
|                                                              |                 | #<br>*      | CC6  | A2#4 | A3#3 | M#2    | Bubble | MA#1   |
|                                                              |                 | Cycle       | CC7  | A2#4 | A3#3 | M#2    | Bubble | MA#1   |
|                                                              |                 | ck (        | CC8  | M#5  | A2#4 | A3#3   | M#2    | Bubble |
|                                                              |                 | Clock       | CC9  | M#5  | A2#4 | Bubble | A3#3   | M#2    |
| I am trying to check to see that students are careful not to |                 |             | CC10 |      | M#5  | A2#4   | Bubble | A3#3   |
| MA or M is preparing to leav                                 |                 | #5 A2#4 Bub |      |      |      | Bubble |        |        |
|                                                              |                 | _           | CC12 |      |      | M#5    | A2#4   | Bubble |

In what clocks (write CC numbers) did **STALL\_D** go active? In what subsequent clocks did **STALL\_D** go inactive?

In what clocks (write CC numbers) did **STALL\_M** go active? In what subsequent clocks did **STALL M** go inactive?

| CC3, CC4, due to N | M#2's dependency<br>IA#1 CC | due to A2#4's depen<br>on A3#3 | idency |
|--------------------|-----------------------------|--------------------------------|--------|
| CC                 | 5                           | CC9                            |        |
| CC3 for MA#1       | CC6 for M#2                 | CC11 for M#5                   |        |
| CC4                | CC7                         | CC12                           |        |

## **Q#1 Spring 2021 MT**

The 3-element adder

with a SUB3 and a BXY



Separate HDU and FUs. So, instructions need to bring their source operands to later stages to compare with the destinations of their seniors.

Count of comparison units in boxes next to the HDU and FUs.

```
Instruction
                              Operation
NOP
         $R, $X, $Y;
AXY
                              (\$R) \le (\$X) + (\$Y)
         $R, $X, $Y;
AXYS3
                              (\$R) \le (\$X) + (\$Y) - 3
         R, SX, SY, SZ; (R) <= (R) + (R) + (R) + (R)
AXYZ
         R, SX, SY, SZ; (R) <= (R) + (R) -3 + (R)
AXYZS3
                           (PC) \le JJJJ \text{ if } (XplusY == 0)
         $X, $Y, JJJJ;
BXY
```

- BXY has one delay slot
- = do not flush Junior #1 even if BXY is successful
- = do not initiate bubble-injection from the Junior #1's stage



adequate!

Write a number in each of the 4 small boxes indicating the number of register ID comparison units on the next page and transcribe them here. \$0 is not a special register here.



Now arrive at the bare minimum number of comparison units we would need if we followed our Lab 7 Part 1 method of pooling all comparison units in a comparison station in ID stage.

comp station in ID stage 9 The 4 comparison units in the HDU are replicated in the FU\_EX1.

Do not help an instruction in the result path! You need to help in its source path!



### Q#1 Fall 2021 MT

Mr. Bruin created a 4-element adder similar to the Fall 2018 Question but it is neither a tree not a linear cascade!

See the MOV instruction support!

Mr. Bruin and Miss Bruin are the TAs for a similar course at the other school. They looked at our above Fall 2018 midterm question and solution. They wanted to design a midterm question for their course by combining the above two designs in the following fashion. Note that they are adding (W, X) in EX1 and (Y, Z) in EX2 and adding the two intermediate sums in the EX3.

Mr. Bruin's 6-stage solution block diagram:



Miss Bruin told Mr. Bruin that there is no need for any forwarding muxes in the EX2 and EX3 stages. Her design is given below:



#### Mr. Bruin's 6-stage solution block diagram:



Miss Bruin told Mr. Bruin that there is no need for any forwarding muxes in the EX2 and EX3 stages. Her design is given below:



You agree with (A/B) where A = Mr. Bruin's design, B = Miss Bruin's design, C = neither. If you chose neither, then complete your design below.

You agree with \_\_\_\_\_ (A/BC) where A = Mr. Bruin's design, B = Miss Bruin's design, C = neither. If you chose neither, then complete your design below.



# Adding support for a MOV instruction to the Q#1 of Fall 2021 MT

# Recollect MOV instruction implementation in Lab7 P3

**NOP** 

MOV  $R, X; (R) \le (X)$ 

SUB3 R, X; (R) <= (X) - 3

ADD4  $R, X; (R) \le (X) + 4$ 

ADD1  $R, X; (R) \le (X) + 1$ 



MOV R, W (meaning R  $\leq$  W)

MOV R, Y (meaning R <= Y)





**MOV** R, W (meaning R  $\leq$  W)

**MOV** R, Y (meaning R  $\leq$  Y)

MOV does not need to stall at all as it can postpone receiving help in a later stage (as late as EX3).

