Tags: None.
Categories: None.

To further reduce the computational burden of floating-point or integer multiplications, BitNet introduces BitLinear layers with binary (1-bit), ternary (1.58-bit), or quaternary (2-bit) weights.By using different quantization,we can simplify multiplications in MatMul to a simple additions.For example,the MaxMul with ternary quantization can be written as:

Ternary weights can store as bit .

For another quantization such as binary and quaternary,formated as 1-bit and 2-bit,are optimal for storage.

Modern computer dispose data in Bytes,which formated as bits.For ternary width(1.58-bit),we need additional circut or software supported to store it.This thesis stores ternary weights as 2-bit format.

To present different quantization,this thesis designed a core function unit:

According differect weights ,the MUX selects different data:

The SUM4 processes 4 inputs.These inputs processed by four AMux are handled by a adder tree shown as the figure,then resulting a 32-bit output,which means generating four 8-bit vector.

This figure shows how BNRV is implemented:

Here is the verilog of AMux:

1
2
3
4
5
6
7
8
9
10
11
12
module BNRV_4Mux(input [7:0] x,input [1:0] w,output [7:0] o);
reg [7:0] t_o;
always @(*)
case(w)
2'b00: t_o = x;
2'b01: t_o = ~x + 1;
2'b10: t_o = {8{1'b0}};
2'b11: t_o = (~x + 1 ) << 1;
endcase
assign o = t_o;
endmodule