Tag: optimal policy

  • The Number Called Value

    The Number Called Value

    10,000 won tomorrow or 9,000 won today—which is more valuable? This question sits at the heart of reinforcement learning. Value functions compress uncertain futures into present numbers. The Bellman equation, TD learning, and Q-learning: the mathematics of foresight.