基于D-DQN強化學(xué)習算法的雙足機器人智能控制研究

首頁(yè) > 過(guò)刊瀏覽>2024年第32卷第3期 >181-187

基于D-DQN強化學(xué)習算法的雙足機器人智能控制研究
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        
                        
                    
作者單位:
作者簡(jiǎn)介:
通訊作者:
中圖分類(lèi)號:
基金項目:2022年度廣州華商學(xué)院高等教育教學(xué)改革項目（HS2022ZLGC71）

Research on Intelligent Control of Biped Robot Based on D-DQN Reinforcement Learning Algorithm

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪(fǎng)問(wèn)統計

參考文獻

相似文獻

引證文獻

資源附件

文章評論

摘要:

針對現有雙足機器人智能控制算法存在的軌跡偏差大、效率低等問(wèn)題,提出了一種基于D-DQN強化學(xué)習的控制算法。先分析雙足機器人運動(dòng)中的坐標變換關(guān)系和關(guān)節連桿補償過(guò)程,然后基于Q值網(wǎng)絡(luò )實(shí)現對復雜運動(dòng)非線(xiàn)性過(guò)程降維處理,采用了Q值網(wǎng)絡(luò )權值和輔助權值的雙網(wǎng)絡(luò )權值設計方式,進(jìn)一步強化DQN網(wǎng)絡(luò )性能,并以Tanh函數作為神經(jīng)網(wǎng)絡(luò )的激活函數,提升DQN網(wǎng)絡(luò )的數值訓練能力。在數據訓練和交互中經(jīng)驗回放池發(fā)揮出關(guān)鍵的輔助作用,通過(guò)將獎勵值輸入到目標函數中,進(jìn)一步提升對雙足機器人的控制精度,最后通過(guò)虛擬約束控制的方式提高雙足機器人運動(dòng)中的穩定性。實(shí)驗結果顯示:在D-DQN強化學(xué)習的控制算法,機器人完成第一階段測試的時(shí)間僅為115s,綜合軌跡偏差0.02m,而且步態(tài)切換極限環(huán)測試的穩定性良好。

Abstract:

Aiming at the problems of large trajectory deviation and low efficiency of existing intelligent control algorithms for biped robots, a control algorithm based on D-DQN reinforcement learning is proposed. Firstly, the coordinate transformation relationship in the motion of biped robot and the compensation process of joint and link are analyzed, and then the dimensionality reduction of complex nonlinear motion process is realized based on Q-value network. The double weight design method of Q-value network weight and auxiliary weight is adopted to strengthen the performance of DQN network, and Tanh function is used as the activation function of neural network to improve the numerical training ability of DQN network. The experience playback pool plays a key auxiliary role in data training and interaction. By inputting the reward value into the objective function, the control accuracy of the biped robot is further improved. Finally, the stability of the biped robot is improved by virtual constraint control. The experimental results show that under the D-DQN reinforcement learning control algorithm, the time of the robot to complete the first stage test is only 115s, the comprehensive trajectory deviation is 0.02m, and the stability of the gait switching limit cycle test is good.

參考文獻

相似文獻

引證文獻

引用本文

李麗霞,陳艷.基于D-DQN強化學(xué)習算法的雙足機器人智能控制研究計算機測量與控制[J].,2024,32(3):181-187.

復制

文章指標

點(diǎn)擊次數:
下載次數:
HTML閱讀次數:
引用次數:

歷史

收稿日期:2023-08-22
最后修改日期:2023-09-08
錄用日期:2023-09-11
在線(xiàn)發(fā)布日期: 2024-04-01
出版日期:

国产欧美精品一区二区,中文字幕专区在线亚洲,国产精品美女网站在线观看,艾秋果冻传媒2021精品,在线免费一区二区,久久久久久青草大香综合精品,日韩美aaa特级毛片,欧美成人精品午夜免费影视

引用本文

分享

文章指標

歷史

文章二維碼