Game Theory 3: Finite Dynamic Games, Backward Induction, and the One-Shot Deviation Principle

A note on finite dynamic games of complete information, with careful definitions, a full backward-induction example, and a clean statement of the one-shot deviation principle. The goal is to make the logic of subgame-perfect equilibrium precise and easy to read.

Dynamic games Complete information Subgame perfection Backward induction One-shot deviation Repeated games

Table of contents

1. Finite dynamic games of complete information extensive-form foundation

A finite dynamic game is usually described in extensive form. The game unfolds over time, players may move at different nodes, and payoffs are realized at the end of the path.

Extensive-form game

A finite extensive-form game of complete information can be written as

\[ \Gamma = \bigl(N,H,Z,P,(A(h))_{h \in H \setminus Z},(u_i)_{i \in N}\bigr). \]

The ingredients are:

  • Players. \(N=\{1,2,\dots,n\}\).
  • Histories. \(H\) is the finite set of possible action histories, including the empty history \(\varnothing\).
  • Terminal histories. \(Z \subseteq H\) is the set of histories where the game ends.
  • Player function. For every nonterminal history \(h \in H \setminus Z\), \(P(h)\in N\) tells us which player moves at \(h\).
  • Available actions. \(A(h)\) is the set of feasible actions at history \(h\).
  • Payoffs. For each player \(i\), \(u_i: Z \to \mathbb{R}\) assigns a payoff to every terminal history.
Main picture. A dynamic game is a tree. Histories are nodes, actions are edges, terminal histories are leaves, and payoffs sit at the leaves.

Complete information versus perfect information

Concept Meaning Why it matters here
Complete information All players know the game structure, action sets, and payoff functions. This is the benchmark setting for the note.
Perfect information Whenever a player moves, that player knows the entire past history of play. Backward induction applies most cleanly in this case.

Complete information does not automatically imply perfect information. A game can have known payoffs and known rules, yet still hide earlier actions from later movers. In this note, backward induction is presented for the perfect-information case, while the one-shot deviation principle is stated in the broader finite-horizon perfect-recall setting.

Histories and successors

If \(h\) is a history and \(a \in A(h)\), then \((h,a)\) or simply \(ha\) denotes the successor history obtained by taking action \(a\) after \(h\).

A history \(z\in Z\) is terminal if no further action is available. A nonterminal history is a decision node.

Finite horizon

"Finite" means the tree has finitely many histories, so the game ends after finitely many moves. This is exactly what allows recursive arguments such as backward induction.

root history h1 history h2 terminal z1 terminal z2 terminal z3 terminal z4 action a action b continuation action payoffs u(z1)
A finite dynamic game is a finite rooted tree. Solving the game means identifying which path rational players choose when every continuation problem is taken seriously.

2. Strategies, subgames, and subgame-perfect equilibrium full contingent plans

Strategy as a complete plan

In a dynamic game, a player's strategy is not just the action actually observed on the realized path. It is a complete contingent plan that tells the player what to do at every decision node where that player might move.

Pure strategy

Let \(H_i = \{h \in H \setminus Z : P(h)=i\}\) be the set of decision histories of player \(i\). A pure strategy for player \(i\) is a function

\[ s_i : H_i \to \bigcup_{h \in H_i} A(h) \]

such that \(s_i(h) \in A(h)\) for every \(h \in H_i\).

The full strategy profile is \(s=(s_1,\dots,s_n)\). Because the game tree is finite, any profile determines a unique terminal history \(z(s)\), and each player receives payoff \(u_i(z(s))\).

Subgames

A subgame starting at history \(h\) is the continuation game obtained by taking \(h\) as the new root and keeping all later feasible actions and payoffs. In perfect-information trees, every decision node defines a subgame.

Subgame-perfect equilibrium

A strategy profile \(s^*\) is a subgame-perfect equilibrium if it induces a Nash equilibrium in every subgame of the original game.

Interpretation. Subgame perfection rules out empty threats. A strategy profile is acceptable only if it prescribes rational continuation behavior after every possible history, including off-path histories.

Why Nash equilibrium is not enough

Ordinary Nash equilibrium checks whether a player wants to deviate from the full strategy profile before the game starts. It does not force off-path continuation actions to be optimal when those nodes are actually reached.

Why subgame perfection fixes the issue

Subgame perfection asks for Nash equilibrium inside every continuation game. That means every promised punishment or reward must still be optimal when the continuation is reached.

Mini example: why Nash can allow a bad threat.

Consider the following entry game, with payoffs listed as \((\text{entrant},\text{incumbent})\):

Entrant's move Incumbent's continuation Outcome payoff
Out Game ends \((0,2)\)
Enter Fight \((-1,-1)\)
Enter Accommodate \((2,1)\)

Threat that can support Nash: the incumbent says, "If you enter, I will fight." If the entrant believes this, staying out is optimal because \(0 > -1\).

What actually happens in the entry subgame: once entry occurs, the incumbent compares \(-1\) from Fight with \(1\) from Accommodate, so Accommodate is better.

Conclusion: the threat to Fight is non-credible. Nash equilibrium may still allow the profile because the off-path move is never tested, but subgame perfection rejects it because the continuation action is not optimal when the subgame is reached.

A strategy is a full plan. A subgame-perfect equilibrium is a plan that remains optimal after every history, not only on the equilibrium path.

3. Backward induction recursive solution method

Backward induction is the standard solution method for finite dynamic games with perfect information. The idea is simple: solve the last decision problem first, replace it by its optimal payoff, and keep moving backward until the root is solved.

Recursive value rule

For each terminal history \(z\in Z\), define

\[ V_i(z)=u_i(z). \]

If \(h\) is a nonterminal history and player \(i=P(h)\) moves at \(h\), then define

\[ V_i(h)=\max_{a \in A(h)} V_i(ha). \]

An optimal action at \(h\) is any

\[ a^*(h) \in \arg\max_{a \in A(h)} V_i(ha). \]
Backward-induction theorem. Every finite perfect-information game has a subgame-perfect equilibrium, and any strategy profile generated by backward induction is subgame perfect.

Algorithm

1
Start from the last movers. At every decision node just before a terminal node, choose the action that gives the moving player the highest payoff.
2
Collapse solved branches. Replace each solved continuation by the payoff that rational play would deliver from that node onward.
3
Repeat one layer earlier. Treat the reduced tree as a smaller game and solve the next decision problem.
4
Continue to the root. The resulting action at each node defines a full strategy profile.

Why the method works

Local rationality at the end. The last mover faces an ordinary static choice, so selecting a best action is immediate.
Recursion on subgames. Once every later subgame has been solved, the current player compares continuation values that are already rationally justified.
Important qualification. If a player has several payoff-maximizing actions at some node, backward induction can generate several subgame-perfect equilibria. The method gives a set of solutions when ties occur.

4. Worked cases: entry deterrence and pirate gold classic examples

Consider a market-entry game. First the entrant chooses whether to stay out or enter. If entry occurs, the incumbent chooses whether to fight or accommodate.

E I Out Enter Fight Accommodate (-1, -1) (2, 1) (0, 2) Entrant Incumbent
Payoffs are listed as (entrant, incumbent). If entry occurs, the incumbent prefers Accommodate to Fight because \(1 > -1\).

Step 1: Solve the last decision

If the entrant has already chosen Enter, the incumbent compares:

\[ u_I(\text{Fight})=-1, \qquad u_I(\text{Accommodate})=1. \]

So the incumbent chooses Accommodate.

Step 2: Move back to the first decision

The entrant now anticipates that Enter will be followed by Accommodate, so the relevant comparison is:

\[ u_E(\text{Out})=0, \qquad u_E(\text{Enter then Accommodate})=2. \]

Therefore the entrant chooses Enter.

Backward-induction solution. The subgame-perfect equilibrium is \[ (\text{Enter},\text{Accommodate}). \]

Why this example matters

In the strategic form, the profile \((\text{Out},\text{Fight})\) can be a Nash equilibrium: if the entrant expects Fight after entry, staying Out is optimal, and if the entrant stays Out, the incumbent never has to carry out Fight. But this threat is not credible.

Non-credible threat. Once the subgame after Enter is actually reached, Fight is strictly worse than Accommodate for the incumbent. Subgame perfection removes \((\text{Out},\text{Fight})\) for exactly this reason.
Question Answer in this game
What does the last mover do? The incumbent accommodates because \(1 > -1\).
What does the first mover infer? Entry leads to payoff \(2\), which is better than staying out and getting \(0\).
Why is the solution subgame perfect? It is optimal in the whole game and in the subgame after entry.

Another classic backward-induction example: pirate gold

This is one of the best-known dynamic-game puzzles. Three pirates, ordered by seniority as \(A\), \(B\), and \(C\), must divide \(100\) gold coins.

Rules
  • The most senior surviving pirate proposes a division of the \(100\) coins.
  • All surviving pirates vote on the proposal.
  • The proposal passes if it gets at least half of the votes.
  • If the proposal fails, the proposer is eliminated, and the next pirate makes a proposal.
  • Preferences are lexicographic: each pirate first wants to survive, then wants more gold, and if still tied prefers that more rivals be eliminated.

The key is to solve the game from the end.

Remaining pirates What the proposer needs Backward-induction outcome
Only \(C\) No vote problem remains \(C\) gets all \(100\) coins.
\(B\) and \(C\) At least half of \(2\) votes, so \(1\) vote is enough \(B\) votes for his own proposal and keeps all \(100\): \((0,100,0)\).
\(A\), \(B\), and \(C\) At least half of \(3\) votes, so \(2\) votes are needed \(A\) needs one more pirate besides himself.

Step 1: What happens if \(A\) is eliminated?

Then only \(B\) and \(C\) remain. From the table above, \(B\) can pass \((0,100,0)\) using his own vote. So:

\[ ext{If } A \text{ dies, then } B \text{ expects } 100 \text{ and } C \text{ expects } 0. \]

Step 2: What should \(A\) offer?

Since \(B\) expects \(100\) if \(A\) is eliminated, \(B\) will not support any proposal from \(A\) giving him less than \(100\). But \(C\) expects \(0\) if \(A\) is eliminated, so giving \(C\) just \(1\) coin is enough to make \(C\) prefer acceptance.

Backward-induction solution. Pirate \(A\) proposes \[ (99,0,1), \] and the proposal passes with votes from \(A\) and \(C\).
Why \(B\) votes no. If \(A\)'s proposal fails, \(B\) becomes proposer in the next round and gets all \(100\) coins.
Why \(C\) votes yes. If \(A\)'s proposal fails, \(C\) gets \(0\) in the continuation game, so receiving \(1\) coin now is better.
Why this example is famous. The result looks surprising at first, but it follows mechanically from backward induction. Each pirate reasons from the future continuation game, not from vague fairness.

5. One-shot deviation principle local checks imply global optimality

Backward induction is a constructive method. The one-shot deviation principle is mainly a verification tool. Instead of checking every possible alternative strategy, it says we only need to check whether some player can gain by changing the action at one decision point and leaving the rest of the strategy unchanged.

One-shot deviation

Fix a player \(i\) and a strategy profile \(s=(s_i,s_{-i})\). A strategy \(s_i'\) is a one-shot deviation from \(s_i\) at decision point \(h\in H_i\) if

  • \(s_i'(h) \neq s_i(h)\), and
  • \(s_i'(h') = s_i(h')\) for every other decision point \(h' \neq h\).
Principle

In a finite-horizon extensive-form game with perfect recall, a strategy profile \(s\) is subgame perfect if and only if there is no profitable one-shot deviation in any subgame.

Write \(U_i(s \mid h)\) for player \(i\)'s continuation payoff from history \(h\) onward when strategy profile \(s\) is played. Then the condition is:

\[ U_i(s \mid h) \ge U_i(s_i', s_{-i} \mid h) \]

for every player \(i\), every decision point \(h\in H_i\), and every one-shot deviation \(s_i'\) at \(h\).

Reading the theorem in plain English. To verify subgame perfection, you do not need to inspect every complicated re-optimization plan. You only need to ask: "Can some player improve by changing exactly one move at exactly one place?"

Why a local test is enough

1
Suppose a profitable full deviation exists. Then there is a first decision point where the deviating strategy differs from the original strategy.
2
Focus on that first difference. If changing the full future plan increases payoff, then changing the action at that first difference must already be part of what creates the gain.
3
Perfect recall prevents hidden self-contradictions. The player remembers earlier choices and information, so the continuation problem can be decomposed consistently.
4
Therefore one local profitable deviation exists. So if no profitable one-shot deviation exists anywhere, no profitable full deviation exists either.
Backward induction asks: "What action is optimal when I solve the tree from the end?"
One-shot deviation asks: "Given a candidate equilibrium, can any player improve by changing one move at one history?"
Common use case. In repeated games and finite-horizon dynamic games, the one-shot deviation principle is often the fastest way to verify a proposed subgame-perfect equilibrium.

6. Worked case: two-period Prisoner's Dilemma case + explanation

Take the standard stage game:

Player 1 \ Player 2 C D
C (3,3) (0,5)
D (5,0) (1,1)

Now repeat this game for two periods, with total payoff equal to the sum of the two stage payoffs. The game has complete information and finite horizon. Histories after period 1 are observed, so the game has perfect recall.

A candidate profile that fails

Suppose both players plan to choose \(C\) in both periods after every history. This looks cooperative, but it is not subgame perfect.

Profitable one-shot deviation at the last period

At any history before period 2, if the opponent plans to play \(C\), then

\[ u_i(D,C)=5 \quad \text{and} \quad u_i(C,C)=3. \]

So switching from \(C\) to \(D\) at that single decision point raises the player's payoff by \(2\).

Conclusion. "Always cooperate" fails the one-shot deviation test in the final period, so it cannot be a subgame-perfect equilibrium.

The actual subgame-perfect equilibrium

Consider the strategy profile \(s^D\) in which each player chooses \(D\) after every history in both periods.

Period 2 check

At the last period, \(D\) strictly dominates \(C\):

\[ u_i(D,C)=5 > 3=u_i(C,C), \qquad u_i(D,D)=1 > 0=u_i(C,D). \]

So no player wants to deviate from \(D\) to \(C\) in period 2.

Period 1 check

Because continuation play in period 2 is already fixed at \((D,D)\), the period-1 comparison is just the stage-game comparison plus the same continuation payoff on both sides. So \(D\) again strictly dominates \(C\).

Verified equilibrium. The strategy profile "defect after every history" has no profitable one-shot deviation. By the one-shot deviation principle, it is a subgame-perfect equilibrium.
Period 2 D strictly dominates C so final play is D,D Roll back continuation is fixed at D,D in period 2 Period 1 D still dominates C so play D,D again
Backward induction and the one-shot deviation principle point to the same conclusion in the finite repeated Prisoner's Dilemma: defect in the last period, then defect earlier as well.

What this example teaches

Finite horizon matters. At the last period there is no future punishment, so the incentive to defect becomes immediate and unavoidable.
One-shot deviation is efficient. You do not need to enumerate every possible contingent strategy in both periods.
Local rationality propagates backward. Once the final period is solved, earlier periods inherit the same logic.

7. Summary and checklist what to remember

Tool Best use Main output
Backward induction Construct an equilibrium in a finite perfect-information game A subgame-perfect equilibrium obtained by recursive optimization
One-shot deviation principle Verify a candidate equilibrium in a finite-horizon game with perfect recall A quick test for subgame perfection
Backward induction builds the equilibrium. The one-shot deviation principle checks the equilibrium.