Le t u an d u * be arbitrary final r e w a rd v ectors w ith u = u * . (a) Le t k be an arbitrary…

Let u and u∗ be arbitrary final reward vectors with u ≤ u∗.

(a)   Let k be an arbitrary stationary policy and prove that vk(n, u) ≤ vk(n, u∗) for each

n ≥ 1.

(b)   For the optimal dynamic policy, prove that v∗(n, u)  ≤ v∗(n, u∗) for each n  ≥ 1.

This is known as the monotonicity theorem.

(c)  

i   Now let u and u∗ be arbitrary. Let α = maxi(ui − u∗). Show that

v∗(n, u) ≤ v∗(n, u∗) + αe.

 

"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!":

Get started