A note on finite dynamic games of complete information, with careful definitions, a full backward-induction example, and a clean statement of the one-shot deviation principle. The goal is to make the logic of subgame-perfect equilibrium precise and easy to read.
A finite dynamic game is usually described in extensive form. The game unfolds over time, players may move at different nodes, and payoffs are realized at the end of the path.
A finite extensive-form game of complete information can be written as
\[ \Gamma = \bigl(N,H,Z,P,(A(h))_{h \in H \setminus Z},(u_i)_{i \in N}\bigr). \]The ingredients are:
| Concept | Meaning | Why it matters here |
|---|---|---|
| Complete information | All players know the game structure, action sets, and payoff functions. | This is the benchmark setting for the note. |
| Perfect information | Whenever a player moves, that player knows the entire past history of play. | Backward induction applies most cleanly in this case. |
Complete information does not automatically imply perfect information. A game can have known payoffs and known rules, yet still hide earlier actions from later movers. In this note, backward induction is presented for the perfect-information case, while the one-shot deviation principle is stated in the broader finite-horizon perfect-recall setting.
If \(h\) is a history and \(a \in A(h)\), then \((h,a)\) or simply \(ha\) denotes the successor history obtained by taking action \(a\) after \(h\).
A history \(z\in Z\) is terminal if no further action is available. A nonterminal history is a decision node.
"Finite" means the tree has finitely many histories, so the game ends after finitely many moves. This is exactly what allows recursive arguments such as backward induction.
In a dynamic game, a player's strategy is not just the action actually observed on the realized path. It is a complete contingent plan that tells the player what to do at every decision node where that player might move.
Let \(H_i = \{h \in H \setminus Z : P(h)=i\}\) be the set of decision histories of player \(i\). A pure strategy for player \(i\) is a function
\[ s_i : H_i \to \bigcup_{h \in H_i} A(h) \]such that \(s_i(h) \in A(h)\) for every \(h \in H_i\).
The full strategy profile is \(s=(s_1,\dots,s_n)\). Because the game tree is finite, any profile determines a unique terminal history \(z(s)\), and each player receives payoff \(u_i(z(s))\).
A subgame starting at history \(h\) is the continuation game obtained by taking \(h\) as the new root and keeping all later feasible actions and payoffs. In perfect-information trees, every decision node defines a subgame.
A strategy profile \(s^*\) is a subgame-perfect equilibrium if it induces a Nash equilibrium in every subgame of the original game.
Ordinary Nash equilibrium checks whether a player wants to deviate from the full strategy profile before the game starts. It does not force off-path continuation actions to be optimal when those nodes are actually reached.
Subgame perfection asks for Nash equilibrium inside every continuation game. That means every promised punishment or reward must still be optimal when the continuation is reached.
Consider the following entry game, with payoffs listed as \((\text{entrant},\text{incumbent})\):
| Entrant's move | Incumbent's continuation | Outcome payoff |
|---|---|---|
| Out | Game ends | \((0,2)\) |
| Enter | Fight | \((-1,-1)\) |
| Enter | Accommodate | \((2,1)\) |
Threat that can support Nash: the incumbent says, "If you enter, I will fight." If the entrant believes this, staying out is optimal because \(0 > -1\).
What actually happens in the entry subgame: once entry occurs, the incumbent compares \(-1\) from Fight with \(1\) from Accommodate, so Accommodate is better.
Conclusion: the threat to Fight is non-credible. Nash equilibrium may still allow the profile because the off-path move is never tested, but subgame perfection rejects it because the continuation action is not optimal when the subgame is reached.
Backward induction is the standard solution method for finite dynamic games with perfect information. The idea is simple: solve the last decision problem first, replace it by its optimal payoff, and keep moving backward until the root is solved.
For each terminal history \(z\in Z\), define
\[ V_i(z)=u_i(z). \]If \(h\) is a nonterminal history and player \(i=P(h)\) moves at \(h\), then define
\[ V_i(h)=\max_{a \in A(h)} V_i(ha). \]An optimal action at \(h\) is any
\[ a^*(h) \in \arg\max_{a \in A(h)} V_i(ha). \]Consider a market-entry game. First the entrant chooses whether to stay out or enter. If entry occurs, the incumbent chooses whether to fight or accommodate.
If the entrant has already chosen Enter, the incumbent compares:
\[ u_I(\text{Fight})=-1, \qquad u_I(\text{Accommodate})=1. \]So the incumbent chooses Accommodate.
The entrant now anticipates that Enter will be followed by Accommodate, so the relevant comparison is:
\[ u_E(\text{Out})=0, \qquad u_E(\text{Enter then Accommodate})=2. \]Therefore the entrant chooses Enter.
In the strategic form, the profile \((\text{Out},\text{Fight})\) can be a Nash equilibrium: if the entrant expects Fight after entry, staying Out is optimal, and if the entrant stays Out, the incumbent never has to carry out Fight. But this threat is not credible.
| Question | Answer in this game |
|---|---|
| What does the last mover do? | The incumbent accommodates because \(1 > -1\). |
| What does the first mover infer? | Entry leads to payoff \(2\), which is better than staying out and getting \(0\). |
| Why is the solution subgame perfect? | It is optimal in the whole game and in the subgame after entry. |
This is one of the best-known dynamic-game puzzles. Three pirates, ordered by seniority as \(A\), \(B\), and \(C\), must divide \(100\) gold coins.
The key is to solve the game from the end.
| Remaining pirates | What the proposer needs | Backward-induction outcome |
|---|---|---|
| Only \(C\) | No vote problem remains | \(C\) gets all \(100\) coins. |
| \(B\) and \(C\) | At least half of \(2\) votes, so \(1\) vote is enough | \(B\) votes for his own proposal and keeps all \(100\): \((0,100,0)\). |
| \(A\), \(B\), and \(C\) | At least half of \(3\) votes, so \(2\) votes are needed | \(A\) needs one more pirate besides himself. |
Then only \(B\) and \(C\) remain. From the table above, \(B\) can pass \((0,100,0)\) using his own vote. So:
\[ ext{If } A \text{ dies, then } B \text{ expects } 100 \text{ and } C \text{ expects } 0. \]Since \(B\) expects \(100\) if \(A\) is eliminated, \(B\) will not support any proposal from \(A\) giving him less than \(100\). But \(C\) expects \(0\) if \(A\) is eliminated, so giving \(C\) just \(1\) coin is enough to make \(C\) prefer acceptance.
Backward induction is a constructive method. The one-shot deviation principle is mainly a verification tool. Instead of checking every possible alternative strategy, it says we only need to check whether some player can gain by changing the action at one decision point and leaving the rest of the strategy unchanged.
Fix a player \(i\) and a strategy profile \(s=(s_i,s_{-i})\). A strategy \(s_i'\) is a one-shot deviation from \(s_i\) at decision point \(h\in H_i\) if
In a finite-horizon extensive-form game with perfect recall, a strategy profile \(s\) is subgame perfect if and only if there is no profitable one-shot deviation in any subgame.
Write \(U_i(s \mid h)\) for player \(i\)'s continuation payoff from history \(h\) onward when strategy profile \(s\) is played. Then the condition is:
\[ U_i(s \mid h) \ge U_i(s_i', s_{-i} \mid h) \]for every player \(i\), every decision point \(h\in H_i\), and every one-shot deviation \(s_i'\) at \(h\).
Take the standard stage game:
| Player 1 \ Player 2 | C | D |
|---|---|---|
| C | (3,3) | (0,5) |
| D | (5,0) | (1,1) |
Now repeat this game for two periods, with total payoff equal to the sum of the two stage payoffs. The game has complete information and finite horizon. Histories after period 1 are observed, so the game has perfect recall.
Suppose both players plan to choose \(C\) in both periods after every history. This looks cooperative, but it is not subgame perfect.
At any history before period 2, if the opponent plans to play \(C\), then
\[ u_i(D,C)=5 \quad \text{and} \quad u_i(C,C)=3. \]So switching from \(C\) to \(D\) at that single decision point raises the player's payoff by \(2\).
Consider the strategy profile \(s^D\) in which each player chooses \(D\) after every history in both periods.
At the last period, \(D\) strictly dominates \(C\):
\[ u_i(D,C)=5 > 3=u_i(C,C), \qquad u_i(D,D)=1 > 0=u_i(C,D). \]So no player wants to deviate from \(D\) to \(C\) in period 2.
Because continuation play in period 2 is already fixed at \((D,D)\), the period-1 comparison is just the stage-game comparison plus the same continuation payoff on both sides. So \(D\) again strictly dominates \(C\).
| Tool | Best use | Main output |
|---|---|---|
| Backward induction | Construct an equilibrium in a finite perfect-information game | A subgame-perfect equilibrium obtained by recursive optimization |
| One-shot deviation principle | Verify a candidate equilibrium in a finite-horizon game with perfect recall | A quick test for subgame perfection |