FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs

Tasci, Mustafa; Istanbullu, Ayhan; Tumen, Vedat; Kosunalp, Selahattin

FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs

dc.authorid	TASCI, MUSTAFA/0000-0002-8073-8587
dc.authorid	TUMEN, VEDAT/0000-0003-0271-216X
dc.contributor.author	Tasci, Mustafa
dc.contributor.author	Istanbullu, Ayhan
dc.contributor.author	Tumen, Vedat
dc.contributor.author	Kosunalp, Selahattin
dc.date.accessioned	2025-07-03T21:25:20Z
dc.date.issued	2025
dc.department	Balıkesir Üniversitesi
dc.description.abstract	Recently, convolutional neural networks (CNNs) have received a massive amount of interest due to their ability to achieve high accuracy in various artificial intelligence tasks. With the development of complex CNN models, a significant drawback is their high computational burden and memory requirements. The performance of a typical CNN model can be enhanced by the improvement of hardware accelerators. Practical implementations on field-programmable gate arrays (FPGA) have the potential to reduce resource utilization while maintaining low power consumption. Nevertheless, when implementing complex CNN models on FPGAs, these may may require further computational and memory capacities, exceeding the available capacity provided by many current FPGAs. An effective solution to this issue is to use quantized neural network (QNN) models to remove the burden of full-precision weights and activations. This article proposes an accelerator design framework for FPGAs, called FPGA-QNN, with a particular value in reducing high computational burden and memory requirements when implementing CNNs. To approach this goal, FPGA-QNN exploits the basics of quantized neural network (QNN) models by converting the high burden of full-precision weights and activations into integer operations. The FPGA-QNN framework comes up with 12 accelerators based on multi-layer perceptron (MLP) and LeNet CNN models, each of which is associated with a specific combination of quantization and folding. The outputs from the performance evaluations on Xilinx PYNQ Z1 development board proved the superiority of FPGA-QNN in terms of resource utilization and energy efficiency in comparison to several recent approaches. The proposed MLP model classified the FashionMNIST dataset at a speed of 953 kFPS with 1019 GOPs while consuming 2.05 W.
dc.description.sponsorship	Balikesir University Scientific Research and Development Support Program (BAP) in Turkiye [2020/091]
dc.description.sponsorship	This work has been supported by Balikesir University Scientific Research and Development Support Program (BAP) in Turkiye under project number 2020/091.
dc.identifier.doi	10.3390/app15020688
dc.identifier.issn	2076-3417
dc.identifier.issue	2
dc.identifier.scopus	2-s2.0-85215821024
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.3390/app15020688
dc.identifier.uri	https://hdl.handle.net/20.500.12462/21474
dc.identifier.volume	15
dc.identifier.wos	WOS:001403933300001
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Mdpi
dc.relation.ispartof	Applied Sciences-Basel
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WOS_20250703
dc.subject	accelerator
dc.subject	FPGA
dc.subject	QNN
dc.subject	deep learning
dc.subject	FINN
dc.title	FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs
dc.type	Article

Koleksiyon

WOS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs

Dosyalar

Koleksiyon