The method uses actor-critic architecture, in which a recursive least-squares TD method is used to estimate parameters of value function during critic training and a value gradient method is used to improve control policy during actor training.

 
  • 該方法采用動(dòng)作-評價(jià)者結構,在評價(jià)者訓練中使用遞推最小二乘TD(RLS-TD)方法估計值函數參數,在動(dòng)作者訓練中使用值梯度下降方法改進(jìn)控制策略。
今日熱詞
目錄 附錄 查詞歷史
国内精品美女A∨在线播放xuan