CDL Modules
|
Files | |
file | riscv_i32_alu.cdl |
ALU for i32 RISC-V implementation. | |
module riscv_i32_muldiv::riscv_i32_muldiv | ( | clock | clk, |
input bit | reset_n, | ||
input t_riscv_i32_coproc_controls | coproc_controls, | ||
output t_riscv_i32_coproc_response | coproc_response, | ||
input t_riscv_config | riscv_config | ||
) |
[in] | riscv_config |
Multiplication:
Consider multiplication of two 3-bit numbers a and b (hence octal)
A straight (unsigned) view of a value X as Xs.Xb is Xb+4*Xs (Xs is sign bit, Xb remaining bits) A signed view of a value X as Xs.Xb is Xb-4*Xs Hence one can consider Xsigned = Xunsigned - 8*Xs
Consider Runsigned = Xunsigned * Yunsigned Then Xsigned * Ysigned = (Xunsigned - 8*Xs) * (Yunsigned - 8*Ys) = (Xunsigned*Yunsigned) + 64*Xs*Ys -8*(Xs*Yunsigned + Ys*Xunsigned) (mod 64) = Runsigned -8*(Xs*Yunsigned + Ys*Xunsigned)
Xunsigned * Yunsigned has the following multiplication table:
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0 1 0 1 2 3 4 5 6 7 2 0 2 4 6 10 12 14 16 3 0 3 6 11 14 17 22 25 4 0 4 10 14 20 24 30 34 5 0 5 12 17 24 31 36 43 6 0 6 14 22 30 36 44 52 7 0 7 16 25 34 43 52 61
If both are signed then we have the following correction to add -8*(Xs*Yunsigned + Ys*Xunsigned) (in decimal...)
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0 1 0 0 0 0 -8 -8 -8 -8 2 0 0 0 0 -16 -16 -16 -16 3 0 0 0 0 -24 -24 -24 -24 4 0 -8 -16 -24 -64 -72 -80 -88 5 0 -8 -16 -24 -72 -80 -88 -96 6 0 -8 -16 -24 -80 -88 -96 -104 7 0 -8 -16 -24 -88 -96 -104 -112
And in octal (addition)
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0 1 0 0 0 0 70 70 70 70 2 0 0 0 0 60 60 60 60 3 0 0 0 0 50 50 50 50 4 0 70 60 50 0 70 60 50 5 0 70 60 50 70 60 50 40 6 0 70 60 50 60 50 40 30 7 0 70 60 50 50 40 30 20
If they are both signed (7==-1, 6==-2, 5=–3, 4==-4) we have the following multiplication table:
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0 1 0 1 2 3 74 75 76 77 2 0 2 4 6 70 72 74 76 3 0 3 6 11 64 67 72 75 4 0 74 70 64 20 14 10 4 5 0 75 72 67 14 11 6 3 6 0 76 74 72 10 6 4 2 7 0 77 76 75 4 3 2 1
If the column (X) in unsigned and the row is signed then we have the following correction to add -8*Ys*Xunsigned (in decimal...)
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 4 0 -8 -16 -24 -32 -40 -48 -56 5 0 -8 -16 -24 -32 -40 -48 -56 6 0 -8 -16 -24 -32 -40 -48 -56 7 0 -8 -16 -24 -32 -40 -48 -56
And in octal (addition)
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 4 0 70 60 50 40 30 20 10 5 0 70 60 50 40 30 20 10 6 0 70 60 50 40 30 20 10 7 0 70 60 50 40 30 20 10
Hence the multiplication table:
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0 1 0 1 2 3 4 5 6 7 2 0 2 4 6 10 12 14 16 3 0 3 6 11 14 17 22 25 4 0 74 70 64 60 54 50 44 5 0 75 72 67 64 61 56 53 6 0 76 74 72 70 66 64 62 7 0 77 76 75 74 73 72 71
Hence the multiplication of two 32-bit numbers X and Y, using a 64-bit accumulator A, can be performed by setting A initially to:
0 for unsigned*unsigned -2^32*(X[31]?Y) for X signed Y unsigned -2^32*(Y[31]?X) for Y signed X unsigned -2^32*(Y[31]?X[31;0] + X[31]?Y[31;0]) for both signed.
The operation of the multiply then requires a 64-bit accumulator
+1 +4 provides 0, 1, 4, 5 (single 35-bit adder) (stage1_0 = 0; stage1_1 = (3b0,in); stage1_4 = (1b0,in,2b0); stage1_5 = stage1_1 + stage1_4;)
+1 +4 with optional double provides 0, 2, 8, 10, 1, 3, 9, 11, 4, 6, 12, 14, 5, 7, 13, 15 (one more 36-bit adders) (0 = 0+0; 2=0+stage1_1_dbl; 3=stage1_1+stage1_1_dbl; stage2_add_in_0 = mux(stage1_0, stage1_1, stage1_4, stage1_5) stage2_add_in_1 = mux(stage1_0, stage1_1, stage1_4, stage1_5)<<1
Division
Unsigned division/remainder (column / row)
0 1 2 3 4 5 6 7
0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 1 0/0 1/0 2/0 3/0 4/0 5/0 6/0 7/0 2 0/0 0/1 1/0 1/1 2/0 2/1 3/0 3/1 3 0/0 0/1 0/2 1/0 1/1 1/2 2/0 2/1 4 0/0 0/1 0/2 0/3 1/0 1/1 1/2 1/3 5 0/0 0/1 0/2 0/3 0/4 1/0 1/1 1/2 6 0/0 0/1 0/2 0/3 0/4 0/5 1/0 1/1 7 0/0 0/1 0/2 0/3 0/4 0/5 0/6 1/0
Signed division/remainder (column / row) (x86 except div by 0) Note: x86, C99 - sign of remainder = sign of dividend
0 1 2 3 4 5 6 7
0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 1 0/0 1/0 2/0 3/0 4/0 5/0 6/0 7/0 2 0/0 0/1 1/0 1/1 6/0 7/7 7/0 0/7 3 0/0 0/1 0/2 1/0 7/7 7/0 0/6 0/7 4 0/0 0/1 0/2 0/3 1/0 0/5 0/6 0/7 5 0/0 0/1 0/2 7/0 1/7 1/0 0/6 0/7 6 0/0 0/1 7/0 7/1 2/0 1/7 1/0 0/7 7 0/0 7/0 6/0 5/0 4/0 3/0 2/0 1/0
For positive/positive one can use unsigned division directly For negative/negative one can do -d/-r, and negate the remainder For negative/positive one can do -d/r, then negate the result and the remainder For positive/negative one can do d/-r, then negate the result
So the first cycle of a divide prepares 'positive' d and r and records the signs (as required)
The multiply requires three adders One 34 bit; one 36 bit, one 64 bit. Divide requires compare; it could do 3 compares per cycle, or just one to start with
We have a multiplier register that gets shifted; this can be the divisor that gets shifted
Multiply then occurs with the following states:
Init : adder high 0 is zero or abs(a) if signed and a negative; adder high 1 is zero or abs(b) if signed and b negative; a_reg <= rs1, b_reg <= rs2 Step (until completed) : a_reg = a_reg>>4; mult=a_reg&15; shf=stage; adder is shifter + acc, with carry chain. complete if a_reg&15 will be 0 Result valid: accumulator has result (present top or bottom half)
Divide occurs with the following states:
Init Shift Step (until completed) Result valid (provides result signing stuff)
Hence the design is for a data pipeline with (for multiply):
a_reg - contains multiplier b_reg - contains multiplicand accumulator - contains 64-bit result of multiply
Result is accumulator - pick top or bottom 32 bits as required
For divide it becomes:
a_reg - contains abs(a) (if signed) else a (dividend) b_reg - contains -abs(b) (if signed) else -b (divisor) accumulator - bottom 32 bits contains remainder (initialized to a_reg)
Quotient result is accumulator high, or negated accumulator high if signed and signs of two inputs differ Remainder result is accumulator low, or negated accumulator low if signed and dividend input was negative