The objective is to find an optimal policy which maximizes the expected average reward per time step over infinite horizon.是什么意思_The objective is to find an optimal policy which maximizes the expected average reward per time step over infinite horizon.在線(xiàn)翻譯_英語(yǔ)_讀音_用法_例句