To further reduce the computational burden of floating-point or integer multiplications, BitNet introduces BitLinear layers with binary (1-bit), ternary (1.58-bit), or quaternary (2-bit) weights.By using different quantization,we can simplify multiplications in MatMul to a simple additions.For example,the MaxMul with ternary quantization can be written as:
Ternary weights can store as
For another quantization such as binary and quaternary,formated as 1-bit and 2-bit,are optimal for storage.
Modern computer dispose data in Bytes,which formated as bits.For ternary width(1.58-bit),we need additional circut or software supported to store it.This thesis stores ternary weights as 2-bit format.
To present different quantization,this thesis designed a core function unit:
According differect weights
The SUM4 processes 4 inputs.These inputs processed by four AMux are handled by a adder tree shown as the figure,then resulting a 32-bit output,which means generating four 8-bit vector.
This figure shows how BNRV is implemented:
Here is the verilog of AMux:
1 | module BNRV_4Mux(input [7:0] x,input [1:0] w,output [7:0] o); |