Acceleration of CNNs for Person Detection in Autonomous Driving using FPGA's and SIMD
Optimized a proprietary FPGA hardware accelerator implementation for Convolutional Neural Networks and computer vision algorithms on a Xilinx UltraZED SoC platform for Person Detection and Segmentation from RGB videos in the context of autonomous driving.
- Reduced combined CPU pre- and postprocessing times from 800ms per frame to 55ms per frame by speeding up calculations using SIMD (Arm Neon)
- Reduced inference time of the FPGA hardware accelerator (written in HLS) from 240ms to 140ms per frame by redesigning the accelerator from the ground up in order to use an optimized memory layout and achieve better resource utilization (increased utilization of DSP-Units from 70% to 95%)
- Reduced total processing time per frame from 1040ms to 195ms.