Reinforcement Learning 2
MDP Prediction and Control
先给出上回说到的两个重要公式:
\[\begin{gather}
v^{\pi}(s)=\sum_{a \in A}\pi(a|s) \big( R(s,a)+\gamma \sum_{s' \in S}P(s'|s,a)v^{\pi}(s') \big) \\
q^{\pi}(s,a)=R(s,a)+\gamma \sum_{s' \in S}P(s'|s,a) \sum_{a' \in...