Quantization
Quantization covers various techniques that help to convert input values from a larger set to a reduced set of output values and thereby reducing the number of bits needed to represent information. It may involve converting floating point numbers to integers – i.e., a 32 bit to an 8-bit representation – thereby helping to reduce power, lower memory bandwidth, reduce storage, and improve performance.
DSD: Dense-Sparse-Dense Training for Deep Neural Networks - Click Here
Based on further research conducted at Tsinghua University, the authors propose the use of dynamic-precision data quantization flow and compare it with static precision quantization strategies. This proposed quantization comprises of a two-step process:- Weight Quantization: Analyzes the dynamic range of weights in each layer before narrowing them down to an optimal value- Data Quantization: Uses a greedy algorithm to compare the intermediate data of the fixed-point CNN model and the floating-point CNN model, layer by layer, with a goal to reduce the accuracy loss.