Files
file	riscv_i32_alu.cdl
	ALU for i32 RISC-V implementation.

Detailed Description

Modules

module riscv_i32_muldiv::riscv_i32_muldiv	(	clock	clk,
		input bit	reset_n,
		input t_riscv_i32_coproc_controls	coproc_controls,
		output t_riscv_i32_coproc_response	coproc_response,
		input t_riscv_config	riscv_config
	)

Parameters

[in] riscv_config

Multiplication:

Consider multiplication of two 3-bit numbers a and b (hence octal)

A straight (unsigned) view of a value X as Xs.Xb is Xb+4*Xs (Xs is sign bit, Xb remaining bits) A signed view of a value X as Xs.Xb is Xb-4*Xs Hence one can consider Xsigned = Xunsigned - 8*Xs

Consider Runsigned = Xunsigned * Yunsigned Then Xsigned * Ysigned = (Xunsigned - 8*Xs) * (Yunsigned - 8*Ys) = (Xunsigned*Yunsigned) + 64*Xs*Ys -8*(Xs*Yunsigned + Ys*Xunsigned) (mod 64) = Runsigned -8*(Xs*Yunsigned + Ys*Xunsigned)

Xunsigned * Yunsigned has the following multiplication table:

0    1    2    3    4    5    6    7

0 0 0 0 0 0 0 0 0 1 0 1 2 3 4 5 6 7 2 0 2 4 6 10 12 14 16 3 0 3 6 11 14 17 22 25 4 0 4 10 14 20 24 30 34 5 0 5 12 17 24 31 36 43 6 0 6 14 22 30 36 44 52 7 0 7 16 25 34 43 52 61

If both are signed then we have the following correction to add -8*(Xs*Yunsigned + Ys*Xunsigned) (in decimal...)

0    1    2    3    4    5    6    7

0 0 0 0 0 0 0 0 0 1 0 0 0 0 -8 -8 -8 -8 2 0 0 0 0 -16 -16 -16 -16 3 0 0 0 0 -24 -24 -24 -24 4 0 -8 -16 -24 -64 -72 -80 -88 5 0 -8 -16 -24 -72 -80 -88 -96 6 0 -8 -16 -24 -80 -88 -96 -104 7 0 -8 -16 -24 -88 -96 -104 -112

And in octal (addition)

0    1    2    3    4    5    6    7

0 0 0 0 0 0 0 0 0 1 0 0 0 0 70 70 70 70 2 0 0 0 0 60 60 60 60 3 0 0 0 0 50 50 50 50 4 0 70 60 50 0 70 60 50 5 0 70 60 50 70 60 50 40 6 0 70 60 50 60 50 40 30 7 0 70 60 50 50 40 30 20

If they are both signed (7==-1, 6==-2, 5=–3, 4==-4) we have the following multiplication table:

0    1    2    3    4    5    6    7

0 0 0 0 0 0 0 0 0 1 0 1 2 3 74 75 76 77 2 0 2 4 6 70 72 74 76 3 0 3 6 11 64 67 72 75 4 0 74 70 64 20 14 10 4 5 0 75 72 67 14 11 6 3 6 0 76 74 72 10 6 4 2 7 0 77 76 75 4 3 2 1

If the column (X) in unsigned and the row is signed then we have the following correction to add -8*Ys*Xunsigned (in decimal...)

0    1    2    3    4    5    6    7

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 4 0 -8 -16 -24 -32 -40 -48 -56 5 0 -8 -16 -24 -32 -40 -48 -56 6 0 -8 -16 -24 -32 -40 -48 -56 7 0 -8 -16 -24 -32 -40 -48 -56

And in octal (addition)

0    1    2    3    4    5    6    7

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 4 0 70 60 50 40 30 20 10 5 0 70 60 50 40 30 20 10 6 0 70 60 50 40 30 20 10 7 0 70 60 50 40 30 20 10

Hence the multiplication table:

0    1    2    3    4    5    6    7

0 0 0 0 0 0 0 0 0 1 0 1 2 3 4 5 6 7 2 0 2 4 6 10 12 14 16 3 0 3 6 11 14 17 22 25 4 0 74 70 64 60 54 50 44 5 0 75 72 67 64 61 56 53 6 0 76 74 72 70 66 64 62 7 0 77 76 75 74 73 72 71

Hence the multiplication of two 32-bit numbers X and Y, using a 64-bit accumulator A, can be performed by setting A initially to:

0 for unsigned*unsigned -2^32*(X[31]?Y) for X signed Y unsigned -2^32*(Y[31]?X) for Y signed X unsigned -2^32*(Y[31]?X[31;0] + X[31]?Y[31;0]) for both signed.

The operation of the multiply then requires a 64-bit accumulator

+1 +4 provides 0, 1, 4, 5 (single 35-bit adder) (stage1_0 = 0; stage1_1 = (3b0,in); stage1_4 = (1b0,in,2b0); stage1_5 = stage1_1 + stage1_4;)

+1 +4 with optional double provides 0, 2, 8, 10, 1, 3, 9, 11, 4, 6, 12, 14, 5, 7, 13, 15 (one more 36-bit adders) (0 = 0+0; 2=0+stage1_1_dbl; 3=stage1_1+stage1_1_dbl; stage2_add_in_0 = mux(stage1_0, stage1_1, stage1_4, stage1_5) stage2_add_in_1 = mux(stage1_0, stage1_1, stage1_4, stage1_5)<<1

Division

Unsigned division/remainder (column / row)

0    1    2    3    4    5    6    7

0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 1 0/0 1/0 2/0 3/0 4/0 5/0 6/0 7/0 2 0/0 0/1 1/0 1/1 2/0 2/1 3/0 3/1 3 0/0 0/1 0/2 1/0 1/1 1/2 2/0 2/1 4 0/0 0/1 0/2 0/3 1/0 1/1 1/2 1/3 5 0/0 0/1 0/2 0/3 0/4 1/0 1/1 1/2 6 0/0 0/1 0/2 0/3 0/4 0/5 1/0 1/1 7 0/0 0/1 0/2 0/3 0/4 0/5 0/6 1/0

Signed division/remainder (column / row) (x86 except div by 0) Note: x86, C99 - sign of remainder = sign of dividend

0    1    2    3    4    5    6    7

0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 7/0 1 0/0 1/0 2/0 3/0 4/0 5/0 6/0 7/0 2 0/0 0/1 1/0 1/1 6/0 7/7 7/0 0/7 3 0/0 0/1 0/2 1/0 7/7 7/0 0/6 0/7 4 0/0 0/1 0/2 0/3 1/0 0/5 0/6 0/7 5 0/0 0/1 0/2 7/0 1/7 1/0 0/6 0/7 6 0/0 0/1 7/0 7/1 2/0 1/7 1/0 0/7 7 0/0 7/0 6/0 5/0 4/0 3/0 2/0 1/0

For positive/positive one can use unsigned division directly For negative/negative one can do -d/-r, and negate the remainder For negative/positive one can do -d/r, then negate the result and the remainder For positive/negative one can do d/-r, then negate the result

So the first cycle of a divide prepares 'positive' d and r and records the signs (as required)

The multiply requires three adders One 34 bit; one 36 bit, one 64 bit. Divide requires compare; it could do 3 compares per cycle, or just one to start with

We have a multiplier register that gets shifted; this can be the divisor that gets shifted

Multiply then occurs with the following states:

Init : adder high 0 is zero or abs(a) if signed and a negative; adder high 1 is zero or abs(b) if signed and b negative; a_reg <= rs1, b_reg <= rs2 Step (until completed) : a_reg = a_reg>>4; mult=a_reg&15; shf=stage; adder is shifter + acc, with carry chain. complete if a_reg&15 will be 0 Result valid: accumulator has result (present top or bottom half)

Divide occurs with the following states:

Init Shift Step (until completed) Result valid (provides result signing stuff)

Hence the design is for a data pipeline with (for multiply):

a_reg - contains multiplier b_reg - contains multiplicand accumulator - contains 64-bit result of multiply

initialize with 0 for unsigned*unsigned; -2^32*(X[31]?Y) for X signed Y unsigned; -2^32*(Y[31]?X) for Y signed X unsigned; -2^32*(Y[31]?X[31;0] + X[31]?Y[31;0]) for both signed mult_data = b_reg * bottom 4 bits of a_reg mult_shf = b_reg * bottom 4 bits of a_reg shifted to be in correct position for accumulation (i.e. shift left by 4*stage) 64-bit adder of accumulator plus mult_shf

Result is accumulator - pick top or bottom 32 bits as required

For divide it becomes:

a_reg - contains abs(a) (if signed) else a (dividend) b_reg - contains -abs(b) (if signed) else -b (divisor) accumulator - bottom 32 bits contains remainder (initialized to a_reg)

top 32 bits contain quotient (initialized to zero) mult_data = b_reg << bottom 2 bits of stage mult_shf = b_reg shifted to be in correct position for subtraction from remainder 32-bit adder of low accumulator plus mult_shf; if >=0 then must update accumulator low and set bit in accumulator high 32-bit 'set bit N of' of high accumulator to build quotient if

Quotient result is accumulator high, or negated accumulator high if signed and signs of two inputs differ Remainder result is accumulator low, or negated accumulator low if signed and dividend input was negative

Files

Detailed Description

Modules