国产欧美精品一区二区,中文字幕专区在线亚洲,国产精品美女网站在线观看,艾秋果冻传媒2021精品,在线免费一区二区,久久久久久青草大香综合精品,日韩美aaa特级毛片,欧美成人精品午夜免费影视

基于 FPGA 的深度可分離卷積加速器研究
DOI:
CSTR:
作者:
作者單位:

中北大學(xué)儀器與電子學(xué)院

作者簡(jiǎn)介:

通訊作者:

中圖分類(lèi)號:

基金項目:


Research on depth-separable convolution accelerators based on FPGA
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 圖/表
  • |
  • 訪(fǎng)問(wèn)統計
  • |
  • 參考文獻
  • |
  • 相似文獻
  • |
  • 引證文獻
  • |
  • 資源附件
  • |
  • 文章評論
    摘要:

    設計了一種基于FPGA的低功耗深度可分離卷積加速核;根據PW卷積和DW卷積計算中的共性,采用一種固定乘法陣列通過(guò)改變特征和權重輸入數據流的方式實(shí)現兩種卷積的計算結構,最大化DSP的利用率;針對8位非對稱(chēng)量化中符號位可能會(huì )溢出的問(wèn)題,采用符號位單獨處理的方法重新封裝了雙乘法器結構;通過(guò)層內7級流水結構保證每個(gè)周期數據處理的并行度;在Zynq UltraScale+系列FPGA上成功部署了加速結構;經(jīng)實(shí)驗測試,提出的加速結構在提高網(wǎng)絡(luò )推理速度的同時(shí)降低了片上資源的依賴(lài)度和整體功耗,原生MobilenetV2在所提FPGA加速器上的平均吞吐率高達130.6GOPS且整體功耗只有4.1w,滿(mǎn)足實(shí)時(shí)邊緣計算的要求;相比其他硬件平臺,能效比有明顯提升;與FPGA上的同類(lèi)型加速器相比,在性能密度(GOPS/LUT)、功率效率(GOPS/W)和DSP效率(GOPS/DSP)上均有優(yōu)勢。

    Abstract:

    A low power deep separable convolution accelerator kernel based on FPGA is designed. According to the commonality of PW convolution and DW convolution calculation, a fixed multiplicative array is used to realize the two convolution calculation structures by changing the feature and weight input data stream, so as to maximize the utilization of DSP. In order to solve the problem that the sign bit may overflow in 8-bit asymmetric quantization, the double multiplier structure is repackaged by using the sign bit processing method. The parallelism of data processing in each cycle is guaranteed by the 7-level pipelining structure in the layer. Successfully deployed the accelerator structure on the Zynq UltraScale+ series FPGA; The experimental results show that the proposed acceleration structure can improve the inference speed of the network and reduce the dependence of on-chip resources and the overall power consumption. The average throughput of the original MobilenetV2 on the proposed FPGA accelerator is as high as 130.6GOPS and the overall power consumption is only 4.1w, which meets the requirements of real-time edge computing. Compared with other hardware platforms, the energy efficiency ratio is significantly improved; Compared with the same type of accelerator on FPGA, it has advantages in performance density (GOPS/LUT), power efficiency (GOPS/W) and DSP efficiency (GOPS/DSP).

    參考文獻
    相似文獻
    引證文獻
引用本文

畫(huà)芊昊,李博,杜宸罡.基于 FPGA 的深度可分離卷積加速器研究計算機測量與控制[J].,2024,32(5):267-273.

復制
分享
文章指標
  • 點(diǎn)擊次數:
  • 下載次數:
  • HTML閱讀次數:
  • 引用次數:
歷史
  • 收稿日期:2023-12-21
  • 最后修改日期:2024-01-08
  • 錄用日期:2024-01-10
  • 在線(xiàn)發(fā)布日期: 2024-05-22
  • 出版日期:
文章二維碼
宣化县| 枞阳县| 凤凰县| 尼勒克县| 中山市| 阆中市| 繁峙县| 应用必备| 辽阳县| 古丈县| 潼关县| 纳雍县| 石城县| 麻江县| 天气| 湄潭县| 九寨沟县| 蒲江县| 芜湖市| 长宁县| 高唐县| 房产| 阳原县| 金秀| 磴口县| 太湖县| 舟曲县| 肥乡县| 普定县| 万荣县| 靖远县| 乌苏市| 定西市| 宿松县| 高淳县| 赤壁市| 延吉市| 二连浩特市| 楚雄市| 藁城市| 桐庐县|