BNRV(2)

To further reduce the computational burden of floating-point or integer multiplications, BitNet introduces BitLinear layers with binary (1-bit), ternary (1.58-bit), or quaternary (2-bit) weights.By using different quantization,we can simplify multiplications in MatMul to a simple additions.For example,the MaxMul with ternary quantization can be written as:

Ternary weights can store as bit .

For another quantization such as binary and quaternary,formated as 1-bit and 2-bit,are optimal for storage.

Modern computer dispose data in Bytes,which formated as bits.For ternary width(1.58-bit),we need additional circut or software supported to store it.This thesis stores ternary weights as 2-bit format.

To present different quantization,this thesis designed a core function unit:

According differect weights ,the MUX selects different data:

The SUM4 processes 4 inputs.These inputs processed by four AMux are handled by a adder tree shown as the figure,then resulting a 32-bit output,which means generating four 8-bit vector.

This figure shows how BNRV is implemented:

Here is the verilog of AMux:

module BNRV_4Mux(input [7:0] x,input [1:0] w,output [7:0] o);
    reg [7:0] t_o;
    always @(*)
        case(w)
            2'b00: t_o = x;
            2'b01: t_o = ~x + 1;
            2'b10: t_o = {8{1'b0}};
            2'b11: t_o = (~x + 1 ) << 1;
        endcase
    assign o = t_o;
endmodule

Tian`s Blog

BNRV(2)