Le t u an d u * be arbitrary final r e w a rd v ectors w ith u = u * . (a) Le t k be an arbitrary…
Let u and u∗ be arbitrary final reward vectors with u ≤ u∗.
(a) Let k be an arbitrary stationary policy and prove that vk(n, u) ≤ vk(n, u∗) for each
n ≥ 1.
(b) For the optimal dynamic policy, prove that v∗(n, u) ≤ v∗(n, u∗) for each n ≥ 1.
This is known as the monotonicity theorem.
(c)
i Now let u and u∗ be arbitrary. Let α = maxi(ui − u∗). Show that
v∗(n, u) ≤ v∗(n, u∗) + αe.