Tag: value function
-

The Critic and the Actor
“I’ll do it this way.” The actor speaks. “That’s not great.” The critic responds. The most successful structure in reinforcement learning separates action and evaluation. Actor-Critic combines value-based efficiency with policy-based flexibility—the foundation of A2C, A3C, PPO, SAC, and ChatGPT’s RLHF.
-

The Number Called Value
10,000 won tomorrow or 9,000 won today—which is more valuable? This question sits at the heart of reinforcement learning. Value functions compress uncertain futures into present numbers. The Bellman equation, TD learning, and Q-learning: the mathematics of foresight.
