It is rational to adopt the average reward reinforcement learning algorithms for solving the absorbing goal states cyclical tasks: It has the merit of converging quickly and robustly.

　

英美

摘要對于有吸收目標狀態(tài)的循環(huán)任務(wù)，比較合理的方法是采用基于平均報酬模型的強化學(xué)習。平均報酬模型強化學(xué)習具有收斂速度快、魯棒性強等優(yōu)點(diǎn)。

以上內容獨家創(chuàng )作，受著(zhù)作權保護，侵權必究

今日熱詞

相關(guān)詞典網(wǎng)站：

牛津高階第八版

美國韋氏詞典

Free Dictionary

維基百科 (自由的百科全書(shū))

目錄

附錄

音標說(shuō)明

查詞歷史

国内精品美女A∨在线播放xuan