Tag: optimal policy
-

The Number Called Value
10,000 won tomorrow or 9,000 won today—which is more valuable? This question sits at the heart of reinforcement learning. Value functions compress uncertain futures into present numbers. The Bellman equation, TD learning, and Q-learning: the mathematics of foresight.
