Week 2 — Workshop

Lean Proof of the Stronger No-Free-Lunch Theorem

Topics in Learning Theory · Step-by-step walkthrough of the Week 1 formalization

This workshop page is for presenting the Lean proof slowly. The main point is not to learn all of Lean at once; it is to see how the paper proof breaks into a small number of definitions and one core induction argument.

1. The Theorem and the Wrapper Proof

The workshop should begin with the mathematical statement itself. Once that is on the page, we can compare the paper proof to the Lean wrapper proof and then unpack the helper theorems underneath.

Stronger Form Adversarial Online No-Free-Lunch

Let $\mathcal{X}$ be any instance space and let $A$ be any deterministic learner. For every finite sequence $x_1, \ldots, x_T$ of distinct points in $\mathcal{X}$, there exists a function $f : \mathcal{X} \to \{0, 1\}$ such that $A$ makes $T$ mistakes on the sequence $(x_t,\; y_t = f(x_t))_{t=1}^T$.

Paper proof

Fix any distinct sequence $x_1, \ldots, x_T$. We define the labels one round at a time. In round $t$, after observing the history $H_{t-1} = ((x_1, f(x_1)), \ldots, (x_{t-1}, f(x_{t-1})))$, the learner predicts $\hat{y}_t = A(H_{t-1})(x_t)$. We then set

$$ f(x_t) \;:=\; 1 - \hat{y}_t. $$

Because the points $x_1, \ldots, x_T$ are distinct, no point is labeled twice, so this prescription is well-defined on the queried set $\{x_1, \ldots, x_T\}$. Extend $f$ arbitrarily to the rest of $\mathcal{X}$.

By construction, at every round $t$ we have $\hat{y}_t \neq f(x_t)$, so the learner makes a mistake on every round. Hence the total number of mistakes is $T$.

Now we switch to the Lean version of exactly the same theorem. The Lean wrapper proof is short, so it is a good place to start the formal walkthrough.

Here “transcript” just means a labeled sequence of examples, such as [(x₁, y₁), ..., (x_T, y_T)]. There are two ways such a transcript appears in this page:

Labeled by one fixed function. First choose one fixed function f : α → Bool, then label the whole list by f. In Lean this is roundsFromFunction f xs.
Labeled online by the adversary. Label the list one round at a time, always choosing the opposite of the learner’s current prediction. In Lean this is adversarialRounds A hist xs.

Section 4 proves that on a distinct list, these two constructions give exactly the same labeled sequence.

Click any line to expand a short explanation directly underneath it.

1theorem noFreeLunch_distinctSequence {α : Type} [DecidableEq α]Expand

The theorem declaration

Meaning. We are stating the stronger NFL theorem for an arbitrary instance space where equality can be checked, which is needed later for definitions like if z = x then y else f z.

theorem: declare a named result in Lean.
noFreeLunch_distinctSequence: the name of this result.
{α : Type}: an arbitrary instance space α; the braces mean this argument is implicit.
[DecidableEq α]: Lean can decide whether two points of α are equal.

Delayed to Section 2. Section 2 explains why [DecidableEq α] is needed when Lean defines adversarialLabeling.

2 (A : Learner α) :Expand

Fix the learner

Meaning. The theorem is about an arbitrary learner A. Once we fix the learner, the rest of the theorem will build an adversarial target against it.

(A : Learner α): an explicit theorem input, namely a learner on α.
Learner α: a function that sees the current history and next point, then predicts a bit.
The ending colon means the statement continues on the next line.

Delayed to Section 2. Section 2 spells out the Lean meaning of Learner α and the basic objects it acts on.

3 ∀ {xs : List α}, xs.Nodup →Expand

Take any distinct sequence

Meaning. This is the stronger quantifier order: give me any distinct sequence of points, and I will produce a bad target for that very sequence.

∀: read this as “for every.”
{xs : List α}: an arbitrary list of points in the instance space; braces again mean the argument is implicit.
xs.Nodup: the list has no repeated points.
→: implication, so this distinctness assumption becomes available in the proof.

4 ∃ f : α → Bool, mistakeCount A (roundsFromFunction f xs) = xs.length := byExpand

The conclusion and the start of the proof

Meaning. The whole theorem is saying there exists one global labeling function that makes the learner wrong on every point in the chosen distinct sequence.

∃ f : α → Bool: we must produce one global target function f.
roundsFromFunction f xs: label each point in xs using that single function.
mistakeCount A (roundsFromFunction f xs): count how many mistakes learner A makes on the resulting transcript.
= xs.length: the number of mistakes equals the number of rounds.
:= by: start the tactic proof.

Delayed to Section 2. Section 2 defines roundsFromFunction in words and explains mistakeCount via mistakeCountOn.

5 intro xs hxsExpand

Introduce the list and its distinctness proof

Meaning. Before this line, the goal says “for every distinct list xs, prove ...”. After this line, we have fixed one arbitrary list xs and assumed it is distinct, so the rest of the proof can use both xs and hxs directly.

intro: move the next binder in the goal into the local proof context.
The first intro consumes the ∀ {xs : List α} from the theorem statement and names that arbitrary list xs.
The second intro consumes the implication xs.Nodup → ... and names that assumption hxs.
xs: the particular list we are now working with.
hxs: the local name for the proof that xs.Nodup holds.

Goal shape before this line.

⊢ ∀ {xs : List α}, xs.Nodup →
    ∃ f : α → Bool,
      mistakeCount A (roundsFromFunction f xs) = xs.length

Goal shape after this line.

xs : List α
hxs : xs.Nodup
⊢ ∃ f : α → Bool,
    mistakeCount A (roundsFromFunction f xs) = xs.length

6 refine ⟨adversarialLabeling A [] xs, ?_⟩Expand

Choose the witness function

Meaning. We choose the adversarially defined global function as the target f. The only thing left is to prove that this choice really forces xs.length mistakes.

refine: partially fill in the object Lean wants, leaving the rest as a new goal.
⟨adversarialLabeling A [] xs, ?_⟩: tuple / witness notation used to satisfy the existential goal.
adversarialLabeling A [] xs: the chosen witness function.
?_: a placeholder for the proof obligation that remains.

Delayed to Section 2. Section 2 defines adversarialLabeling line by line and explains how it packages online adversarial choices into one fixed function.

Goal after this line.

⊢ mistakeCount A
    (roundsFromFunction (adversarialLabeling A [] xs) xs) = xs.length

7 rw [roundsFromFunction_adversarialLabeling (A := A) (hist := []) hxs]Expand

Rewrite by the bridge theorem

Meaning. Here the subexpression being rewritten is roundsFromFunction (adversarialLabeling A [] xs) xs, which appears inside mistakeCount A (roundsFromFunction (adversarialLabeling A [] xs) xs). The rw line replaces that subexpression by adversarialRounds A [] xs, so the goal switches from “count mistakes on the list labeled by one fixed function” to “count mistakes on the list labeled online by the adversary.”

rw: rewrite the current goal using an equality.
roundsFromFunction_adversarialLabeling: the bridge theorem that says the “fixed function labeling” and the “online adversarial labeling” produce the same labeled sequence.
(A := A), (hist := []): named arguments telling Lean exactly which parameters to use.
hxs: the duplicate-free assumption required by the bridge theorem.
In Lean, rw searches the goal for a subexpression matching the theorem’s left-hand side and replaces it by the right-hand side.

Delayed to Section 4. Section 4 proves exactly why these two labeled sequences are equal when xs has no duplicates.

Goal after this line.

⊢ mistakeCount A (adversarialRounds A [] xs) = xs.length

8 simpa [mistakeCount] using mistakeCountOn_adversarialRounds A [] xsExpand

Finish by the core theorem

Meaning. After the rewrite, the goal is exactly the core adversarial theorem in disguise, so one final simplification closes the wrapper proof.

using mistakeCountOn_adversarialRounds A [] xs: specify the theorem Lean should use to solve the current goal.
mistakeCountOn_adversarialRounds A [] xs: the core induction theorem specialized to the empty history.
simpa [mistakeCount]: simplify, unfolding the definition of mistakeCount, until the goal matches exactly.

Delayed to Sections 2 and 3. Section 2 explains why mistakeCount is just the empty-history wrapper around mistakeCountOn. Section 3 proves mistakeCountOn_adversarialRounds.

Roadmap Why the wrapper proof is only eight lines

The wrapper proof is short because it delegates almost everything to two earlier theorems. In presentation order, there are really three ingredients:

The basic objects: histories, learners, labeled sequences, and mistake counts.
The core induction theorem: the adversarial transcript causes one mistake per round.
The bridge theorem: on a distinct list, the online adversarial labeling agrees with the labeling produced by one fixed global function.

Once those three pieces are in place, the final theorem becomes a short wrapper.

2. Notation and objects

Before reading the proofs, we only need four names that keep showing up:

History α: a list of past labeled examples (x, y).
Learner α: a function that takes the current history and the next query point, then predicts a bit.
adversarialRounds A hist xs: the labeled sequence obtained by going through xs online and flipping the learner’s current prediction at each round.
roundsFromFunction f xs: the labeled sequence obtained by labeling every point in xs with the same fixed function f.

The three Lean definitions below are the ones that the later proofs actually use. They are separate on purpose: first the recursive counter, then its empty-history wrapper, then the global target function.

Recursive counter. mistakeCountOn is the real recursive definition. It replays a transcript from an arbitrary starting history and counts mistakes one round at a time.

1def mistakeCountOn {α : Type} (A : Learner α) : History α → History α → NatExpand

The main counting definition header

Meaning. This is the real recursive definition that counts mistakes. It is more general than mistakeCount because it can start from any current history, not just the empty one.

def: introduce a definition in Lean.
mistakeCountOn: the name of the more general counting function.
{α : Type}: an arbitrary instance space.
(A : Learner α): the learner whose mistakes we are counting.
History α → History α → Nat: the function takes a starting history, then a transcript to replay, and returns a natural number.

2 | _, [] => 0Expand

The empty-transcript case

Meaning. If the transcript is empty, there are no future predictions to evaluate, so the mistake count is zero.

|: start one pattern-matching branch.
_, []: any starting history, but no rounds left to replay.
0: return zero mistakes.

3 | hist, (x, y) :: rounds =>Expand

The recursive case header

Meaning. In the nonempty case, we inspect the next labeled example, decide whether it is a mistake, and then continue on the rest of the transcript.

hist: the current history before replaying the next round.
(x, y) :: rounds: the next labeled example is (x, y), followed by the remaining transcript rounds.
=>: begin the body of the recursive case.

4 let miss : Nat := if A hist x = y then 0 else 1Expand

Count the current round

Meaning. This line computes the mistake indicator for the current round: one if the learner’s prediction disagrees with the label, zero otherwise.

let miss : Nat := if A hist x = y then 0 else 1: create a local natural number called miss.
A hist x: the learner’s prediction on point x given the current history.
if A hist x = y then 0 else 1: contribute 0 if the learner is correct and 1 if it is wrong.

5 miss + mistakeCountOn A (hist ++ [(x, y)]) roundsExpand

Recurse on the remaining transcript

Meaning. The total mistake count is “mistake on this round” plus “mistakes on the rest,” where the rest is evaluated from the updated history.

miss + mistakeCountOn A (hist ++ [(x, y)]) rounds: add the current round’s contribution to the future total.
hist ++ [(x, y)]: update the history by appending the round that was just replayed.
mistakeCountOn A (hist ++ [(x, y)]) rounds: continue counting mistakes on the tail of the transcript.

Empty-history wrapper. mistakeCount is the simpler interface used in the final theorem. It just specializes mistakeCountOn to the case where replay starts from [].

1def mistakeCount {α : Type} (A : Learner α) (rounds : History α) : Nat :=Expand

The wrapper definition header

Meaning. This line says that mistakeCount is a function taking a learner and a transcript and returning the total number of mistakes.

def: introduce a definition in Lean.
mistakeCount: the name of the wrapper function.
{α : Type}: an arbitrary instance space.
(A : Learner α): the learner whose mistakes we are counting.
(rounds : History α): the labeled transcript we will replay.
: Nat: the output is a natural number.

2 mistakeCountOn A [] roundsExpand

What the definition actually does

Meaning. This definition is just a wrapper: mistakeCount means “count mistakes starting from empty history.” That is exactly why the final wrapper proof can finish with simpa [mistakeCount].

mistakeCountOn: the more general function that counts mistakes from an arbitrary starting history.
[]: the empty history.
rounds: the transcript we are replaying.

Global target construction. adversarialLabeling is the definition that packages the online adversary’s round-by-round labels into one ordinary labeling function on points.

1def adversarialLabeling {α : Type} [DecidableEq α] (A : Learner α) :Expand

The definition header

Meaning. This line begins the definition of the single global target function that will imitate the online adversary’s choices.

def: introduce a definition.
adversarialLabeling: the name of the function.
{α : Type}: the instance space.
[DecidableEq α]: equality on α can be decided, which is needed for the later if z = x then y else f z.
(A : Learner α): the learner we are adversarially targeting.

2 History α → List α → α → BoolExpand

The type of the function

Meaning. Given a history and a future list, adversarialLabeling returns an ordinary labeling function on points. So its output is itself a function α → Bool.

History α: first input, the current history.
List α: second input, the future query list.
α: third input, a point whose label we want to know.
Bool: output, the label assigned to that point.

3 | _, [] => fun _ => falseExpand

The base case

Meaning. If there are no future query points left, the definition returns a default function. The actual value outside the queried list does not matter for the theorem, so Lean just uses false.

|: start one pattern-matching branch.
_, []: any history, but an empty future list.
fun _ => false: the constant-false function.

4 | hist, x :: xs =>Expand

The recursive case header

Meaning. This is the interesting case: there is a next query point x, so the adversarial labeling has to decide what label to assign there and how to continue on the tail.

hist: the current history.
x :: xs: the future list split into head x and tail xs.
=>: start the body of the recursive case.

5 let y := flip (A hist x)Expand

Choose the adversarial label

Meaning. This line is the adversarial move itself: define the true label to be the opposite of what the learner predicts on the current round.

let y := flip (A hist x): create a local name y.
A hist x: the learner’s prediction at point x given the current history.
flip: change that predicted bit to the opposite bit.

6 let f := adversarialLabeling A (hist ++ [(x, y)]) xsExpand

Recurse on the tail

Meaning. After labeling the current point, the definition builds the labeling function for the rest of the sequence from the updated history, exactly matching the online process.

let f := adversarialLabeling A (hist ++ [(x, y)]) xs: create a local name for the recursively defined tail labeling.
hist ++ [(x, y)]: update the history by appending the new labeled example.
xs: recurse on the remaining query points.

7 fun z => if z = x then y else f zExpand

Assemble the global function

Meaning. This is how the adaptive labels are packaged into one global target function: the current point gets its adversarial label, and every other point is handled by the recursive function for later rounds.

fun z => if z = x then y else f z: define the output function on an arbitrary point z.
if z = x then y: at the current queried point, use the adversarial label chosen on this round.
else f z: on every other point, defer to the recursively built tail function.

3. The core induction proof

Core Theorem The adversarial transcript causes one mistake per round

Click any line to expand a line-by-line explanation of the induction proof.

1theorem mistakeCountOn_adversarialRounds {α : Type} (A : Learner α)Expand

The core theorem

Meaning. This is the engine of the whole development: before introducing a global function, we first prove that the online adversary makes the learner wrong on every round.

theorem: declare a named result.
mistakeCountOn_adversarialRounds: the name of the core theorem.
{α : Type}: an arbitrary instance space.
(A : Learner α): fix a learner on that instance space.

2 (hist : History α) (xs : List α) :Expand

Why is `hist` a parameter?

Meaning. The theorem is stated from an arbitrary current history, not just the empty one. That stronger form is what makes the induction step work after the first round updates the history.

(hist : History α): the current labeled history seen so far.
(xs : List α): the future query sequence.
The ending colon means the actual equality statement comes next.

3 mistakeCountOn A hist (adversarialRounds A hist xs) = xs.length := byExpand

The equality we want

Meaning. The theorem claims that the adversarial transcript causes exactly one mistake per round.

adversarialRounds A hist xs: build the transcript by flipping the learner’s prediction at each step.
mistakeCountOn A hist (adversarialRounds A hist xs): replay that transcript and count mistakes from starting history hist.
xs.length: the number of rounds in the future list.
:= by: begin the tactic proof.

4 induction xs generalizing hist withExpand

The key induction move

Meaning. This is the key proof move. After one round, the history changes, so the induction hypothesis must already be strong enough to work for any current history.

induction xs: prove the theorem by induction on the future list.
generalizing hist: do not freeze the current history when building the induction hypothesis.
with: the following lines will spell out the nil and cons branches.

5 | nil =>Expand

The base case header

Meaning. This is the base case xs = [], where there are no future rounds to analyze.

|: start one branch of the induction proof.
nil: Lean’s constructor name for the empty list.
=>: begin the proof of this branch.

6 simp [adversarialRounds, mistakeCountOn]Expand

Finish the base case by simplification

Meaning. On the empty list, the adversarial transcript is empty and the mistake count is zero, so the base case collapses immediately to 0 = 0.

simp: simplify the current goal.
[adversarialRounds, mistakeCountOn]: unfold exactly these definitions while simplifying.

7 | cons x xs ih =>Expand

The inductive case header

Meaning. This is the nonempty case. We split off the first round and then use induction on the remaining rounds.

cons: Lean’s constructor name for a nonempty list.
x: the head of the list.
xs: the tail of the list.
ih: the induction hypothesis for the tail.

8 have ih' : mistakeCountOn A (hist ++ [(x, flip (A hist x))]) (adversarialRounds A (hist ++ [(x, flip (A hist x))]) xs) = xs.length := ih (hist ++ [(x, flip (A hist x))])Expand

Specialize the induction hypothesis

Meaning. We specialize the induction hypothesis to the history after round one. That gives exactly the statement we need for the tail of the game.

have: introduce a new local fact.
ih': the name of that new fact.
:= ih (hist ++ [(x, flip (A hist x))]): prove the new fact by applying the induction hypothesis to the updated history.
hist ++ [(x, flip (A hist x))]: the old history with the first adversarially labeled example appended.

9 calcExpand

Start a calculation chain

Meaning. From here on, the proof computes the total mistake count step by step until it becomes the length of the list.

calc: start a structured chain of equalities.
Each following line gives the next expression and a proof of that step.

10 mistakeCountOn A hist (adversarialRounds A hist (x :: xs)) = 1 + mistakeCountOn A (hist ++ [(x, flip (A hist x))]) (adversarialRounds A (hist ++ [(x, flip (A hist x))]) xs) := by simp [adversarialRounds, mistakeCountOn]Expand

Separate the first round from the tail

Meaning. Unfolding the definitions shows that the first round contributes exactly one mistake, and the rest of the transcript is the same problem again from the updated history.

mistakeCountOn A hist (adversarialRounds A hist (x :: xs)): the total mistake count on the whole nonempty list.
= 1 + mistakeCountOn A (hist ++ [(x, flip (A hist x))]) (adversarialRounds A (hist ++ [(x, flip (A hist x))]) xs): split the total into the first round plus the tail.
:= by: the right side of this equality step will now be justified.
The hidden proof is simp [adversarialRounds, mistakeCountOn].

11 _ = 1 + xs.length := by rw [ih']Expand

Use the induction hypothesis

Meaning. This line replaces the tail mistake count by xs.length, because the induction hypothesis already proved exactly that for the updated history.

rw: rewrite using an equality.
ih': the specialized induction hypothesis from the previous line.

12 _ = (x :: xs).length := by simp [Nat.add_comm]Expand

Finish the arithmetic

Meaning. After rewriting, the expression is 1 + xs.length, and this is exactly the same as (x :: xs).length. That completes the induction step.

simp: simplify the arithmetic goal.
[Nat.add_comm]: use commutativity of addition while simplifying.

Takeaway Why the induction works

The conceptual bottleneck is the phrase generalizing hist. After the first adversarial round, the current history is no longer hist; it becomes hist ++ [(x, flip (A hist x))]. So the induction hypothesis must already be strong enough to apply to an updated history, not just the original one.

Once that is set up, the proof has a simple shape: the empty list gives zero mistakes, and in the nonempty case the first round contributes one mistake while the induction hypothesis handles the tail. That is why the final equality is exactly “mistake count = list length.”

4. From an online labeled sequence to a global function

The core induction theorem works directly with the labeled sequence built online, one round at a time. The stronger Week 1 statement asks for a single target function f : α → Bool, so we need one extra layer that packages those round-by-round labels into a global function.

adversarialLabeling defines a global function by recursion on the list xs. At the current point x, it records the adversarial label. For every other point, it delegates to the labeling function built for the tail.

Click any line to expand how the bridge theorem turns the online adversary’s labels into one global function.

1theorem roundsFromFunction_adversarialLabeling {α : Type} [DecidableEq α]Expand

The bridge theorem declaration

Meaning. This theorem connects two ways of producing the same labeled sequence: labeling by one fixed function, and labeling online round by round against the learner.

theorem: declare a named result.
roundsFromFunction_adversarialLabeling: the name of the bridge theorem.
{α : Type}: an arbitrary instance space.
[DecidableEq α]: equality on that space can be decided.

2 (A : Learner α) (hist : History α) :Expand

Why do we still keep `hist`?

Meaning. Just like the core theorem, the bridge theorem must work from an arbitrary current history, because the recursive proof will keep updating that history.

(A : Learner α): the learner under attack.
(hist : History α): the current history from which the comparison starts.
The ending colon means the statement continues below.

3 ∀ {xs : List α}, xs.Nodup →Expand

Distinctness is the crucial hypothesis

Meaning. Distinctness is exactly what makes it possible for one global function to agree with the online round-by-round adversary on the entire list.

∀ {xs : List α}: take an arbitrary list of query points.
xs.Nodup: assume the list has no duplicates.
→: under that assumption, prove the desired equality.

4 roundsFromFunction (adversarialLabeling A hist xs) xs = adversarialRounds A hist xs := byExpand

The equality to prove

Meaning. The theorem says that on a distinct list, these two constructions produce exactly the same labeled sequence.

adversarialLabeling A hist xs: the recursively defined global target function.
roundsFromFunction (adversarialLabeling A hist xs) xs: label the list using that single global function.
adversarialRounds A hist xs: build the labeled sequence directly by the online adversary.
:= by: begin the tactic proof.

5 intro xs hxsExpand

Introduce the list and its hypothesis

Meaning. This line changes the bridge theorem from a general statement about all distinct lists into a concrete local goal about one arbitrary distinct list. That is why the later induction can refer directly to xs and use hxs when it needs the distinctness assumption.

intro: move the next binder in the goal into the local proof context.
The first intro consumes the quantified list ∀ {xs : List α} and names it xs.
The second intro consumes the implication xs.Nodup → ... and names the distinctness proof hxs.
xs: the particular list we are now analyzing.
hxs: the proof that this list is duplicate-free.

6 induction xs generalizing hist withExpand

Induct on the list again

Meaning. The bridge theorem is also proved one point at a time, and again the history must stay general because it changes recursively.

induction xs: prove the bridge by induction on the query list.
generalizing hist: keep the current history arbitrary in the induction hypothesis.
with: the next lines give the nil and cons branches.

7 | nil => simp [roundsFromFunction, adversarialLabeling, adversarialRounds]Expand

The empty-list case

Meaning. On the empty list, both sides reduce to the empty transcript, so the bridge is immediate.

| nil =>: the base case where the list is empty.
simp: simplify by unfolding the listed definitions.
[roundsFromFunction, adversarialLabeling, adversarialRounds]: exactly the definitions needed in this branch.

8 | cons x xs ih =>Expand

The nonempty case

Meaning. We now compare the two transcript constructions on the first queried point and then reduce the rest of the argument to the tail.

cons: the nonempty-list constructor.
x: the head of the list.
xs: the tail of the list.
ih: the induction hypothesis for the tail.

9 rcases List.nodup_cons.mp hxs with ⟨hnotin, hxs⟩Expand

Split the distinctness hypothesis

Meaning. To use distinctness in the recursive step, we split it into the two pieces we actually need: the head does not reappear later, and the tail is itself distinct.

rcases List.nodup_cons.mp hxs with ⟨hnotin, hxs⟩: destruct the distinctness fact into named pieces.
List.nodup_cons.mp hxs: turn Nodup (x :: xs) into facts about the head and tail.
hnotin: the proof that x is not in the tail.
hxs: the proof that the tail is still duplicate-free.

10 have htail : roundsFromFunction (adversarialLabeling A hist (x :: xs)) xs = roundsFromFunction (adversarialLabeling A (hist ++ [(x, flip (A hist x))]) xs) xs := byExpand

The helper fact `htail`

Meaning. Once we move past the first point, the branch if z = x then y else f z is never triggered again, so the tail behaves exactly like the recursively defined tail labeling.

have: introduce a local helper theorem.
htail: the name of that helper theorem.
The hidden statement says the tail labeling ignores the special branch created for the head x.
The key input is hnotin, which says no tail point equals x.

11 calcExpand

Start the calculation

Meaning. The rest of the proof now proceeds by aligning the two transcripts piece by piece.

calc: begin a chained equality proof.
Each following step will rewrite the transcript into a closer and closer match.

12 _ = (x, flip (A hist x)) :: roundsFromFunction (adversarialLabeling A (hist ++ [(x, flip (A hist x))]) xs) xs := by rw [htail]Expand

Use `htail`

Meaning. This step uses the helper fact htail to replace the tail of the global-function transcript by the recursively defined tail transcript.

rw [htail]: replace the tail labeling by the recursively defined tail labeling.

13 _ = (x, flip (A hist x)) :: adversarialRounds A (hist ++ [(x, flip (A hist x))]) xs := by rw [ih (hist := hist ++ [(x, flip (A hist x))]) hxs]Expand

Use the induction hypothesis on the tail

Meaning. After replacing the tail labeling by the recursive tail labeling, we use induction to identify that tail transcript with the adaptive adversarial tail transcript.

rw [ih (hist := hist ++ [(x, flip (A hist x))]) hxs]: rewrite the tail using the bridge theorem already proved for the smaller list.
(hist := hist ++ [(x, flip (A hist x))]): tell Lean which updated history to use.
hxs: the proof that the tail remains duplicate-free.

14 _ = adversarialRounds A hist (x :: xs) := by simp [adversarialRounds]Expand

Finish by simplifying the adversarial head step

Meaning. After the tails have been matched, one final simplification shows that the head labels also line up, so the entire transcripts are equal.

simp: simplify the remaining goal.
[adversarialRounds]: unfold one step of the adaptive transcript definition.

This theorem is the bridge: if the list has no duplicates, then labeling the list by the global function gives exactly the same transcript as the adaptive construction.

Reason Why the theorem assumes xs.Nodup

A single global function must assign one label to each point. If the same point appeared twice, the online adversary could want different labels at the two occurrences. The distinctness assumption rules out exactly that conflict.

5. Back to the wrapper proof

Closing Picture How the whole proof fits together

The wrapper theorem is short because it only combines two earlier results. First, the core induction theorem proves that the online adversary causes one mistake per round. Second, the bridge theorem proves that on a distinct list this adaptive behavior comes from one global function. The wrapper proof then does exactly three things: choose adversarialLabeling A [] xs as the witness, rewrite by the bridge theorem, and finish with the core theorem.

1. The Theorem and the Wrapper Proof

The theorem declaration

Fix the learner

Take any distinct sequence

The conclusion and the start of the proof

Introduce the list and its distinctness proof

Choose the witness function

Rewrite by the bridge theorem

Finish by the core theorem

2. Notation and objects

The main counting definition header

The empty-transcript case

The recursive case header

Count the current round

Recurse on the remaining transcript

The wrapper definition header

What the definition actually does

The definition header

The type of the function

The base case

The recursive case header

Choose the adversarial label

Recurse on the tail

Assemble the global function

3. The core induction proof

The core theorem

Why is hist a parameter?

The equality we want

The key induction move

The base case header

Finish the base case by simplification

The inductive case header

Specialize the induction hypothesis

Start a calculation chain

Separate the first round from the tail

Use the induction hypothesis

Finish the arithmetic

4. From an online labeled sequence to a global function

The bridge theorem declaration

Why do we still keep hist?

Distinctness is the crucial hypothesis

The equality to prove

Introduce the list and its hypothesis

Induct on the list again

The empty-list case

The nonempty case

Split the distinctness hypothesis

The helper fact htail

Start the calculation

Use htail

Use the induction hypothesis on the tail

Finish by simplifying the adversarial head step

5. Back to the wrapper proof

Why is `hist` a parameter?

Why do we still keep `hist`?

The helper fact `htail`

Use `htail`