RL agents essentially do backward induction

6 min readJun 15, 2021

This post assumes the reader has some familiarity with Markov Decision Processes and Game theory.

Policy iteration and Value Iteration are two of the most basic algorithms in tabular Reinforcement Learning. Modern game-playing AI agents like AlphaStar and AlphaGo are based on approximations to these algorithms. The purpose of this post is to give an intuition of what these algorithms are doing. I will restrict to deterministic MDPs for simplicity.

What I’ll do in this post, is explain that we can think of value iteration as…

RL agents essentially do backward induction

Written by AI Explanations