基于ZYNQ的深度學(xué)習卷積神經(jīng)網(wǎng)絡(luò )加速平臺設計

首頁(yè) > 過(guò)刊瀏覽>2022年第30卷第12期 >264-269

基于ZYNQ的深度學(xué)習卷積神經(jīng)網(wǎng)絡(luò )加速平臺設計
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        
                        
                    
作者單位:1.哈爾濱理工大學(xué)計算機科學(xué)與技術(shù)學(xué)院;2.哈爾濱理工大學(xué)電氣與電子工程學(xué)院
作者簡(jiǎn)介:
通訊作者:
中圖分類(lèi)號:
基金項目:國家自然科學(xué)(51971086)；黑龍江省博士后科研啟動(dòng)基金(LBH-Q16118)；黑龍江省高校基礎研究基金(LGYC2018JC004)

Design of NVDLA Acceleration Platform Based on ZYNQ

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪(fǎng)問(wèn)統計

參考文獻

相似文獻

引證文獻

資源附件

文章評論

摘要:

針對將各種卷積神經(jīng)網(wǎng)絡(luò )(CNN)模型部署在不同硬件端來(lái)實(shí)現算法加速時(shí)所遇到的耗費時(shí)間,工作量大等問(wèn)題,采用Tengine工具鏈這一新興的深度學(xué)習編譯器技術(shù)來(lái)設計通用深度學(xué)習加速器,來(lái)將卷積神經(jīng)網(wǎng)絡(luò )模型與硬件后端高效快速對接；深度學(xué)習加速器的平臺采用ZYNQ系列的ZCU104開(kāi)發(fā)板,采用軟硬件協(xié)同設計的思想,將開(kāi)源的英偉達深度學(xué)習加速器(NVDLA)映射到可編程邏輯門(mén)陣列(FPGA)上,與ARM處理器構成SoC系統；NVDLA整體架構規范,包含軟硬件設計,采用Tengine工具鏈代替原來(lái)官方的編譯工具鏈；之后在搭建好的NVDLA平臺上實(shí)現lenet-5和resnet-18的網(wǎng)絡(luò )加速,完成了mnist和cifar-10的數據集圖像分類(lèi)任務(wù)；實(shí)驗結果表明,采用Tengine工具鏈要比NVDLA官方的編譯工具鏈推理速度快2.5倍,并且量化工具使用方便,網(wǎng)絡(luò )模型部署高效。

Abstract:

In view of the timing-consuming and heavy workload problems that encountered when various convolutional neural network (CNN) models are deployed on different hardware to achieve algorithm acceleration, using the Tengine tool chain , an emerging deep learning compiler technology, to design a general deep learning accelerator that can efficiently and fastly connecting the network model and hardware backend. The deep learning accelerator’s platform was a ZYNQ’s ZCU104 development board, the idea of software and hardware co-design was used, the open source Nvidia Deep Learning Acceleator (NVDLA) is mapped on Field Programmable Gate Array (FPGA), and the SoC system was formed with ARM processor. NVDLA’s architecture is very standard, including software and hardware design, the Tengine tool chain is used to replace the original official compilation tool chain. After that, the network of lenet-5 and resnet-18 was realized on the built NVDLA platform, and the image classification task of the mnist and cifar-10 datasets was completed. Experimental results show that the Tengine toolchain is 2.5 times faster than NVDLA’s official compilation toolchain inference speed, and the quantitative tools are easy to use, and the network model deployment is efficient.

參考文獻

相似文獻

引證文獻

引用本文

劉之禹,李述,王英鶴.基于ZYNQ的深度學(xué)習卷積神經(jīng)網(wǎng)絡(luò )加速平臺設計計算機測量與控制[J].,2022,30(12):264-269.

復制

文章指標

點(diǎn)擊次數:
下載次數:
HTML閱讀次數:
引用次數:

歷史

收稿日期:2022-05-16
最后修改日期:2022-06-11
錄用日期:2022-06-13
在線(xiàn)發(fā)布日期: 2022-12-22
出版日期:

国产欧美精品一区二区,中文字幕专区在线亚洲,国产精品美女网站在线观看,艾秋果冻传媒2021精品,在线免费一区二区,久久久久久青草大香综合精品,日韩美aaa特级毛片,欧美成人精品午夜免费影视

引用本文

分享

文章指標

歷史

文章二維碼