Embedded AI hardware ranges from general-purpose CPUs and GPUs, FPGAs, and special-purpose ASICs. Hardware accelerators can reduce latency and increase throughput, thereby making AI algorithms such as convolution layers of CNNs a perfect target for FPGA-based acceleration. These AI model complexities, involving software-hardware co-design, are what drives the need for hardware accelerators that can support the training and inference needs of AI.