[Paper Artifact] Bernoulli sampling theorem

Open hoheinzollern opened this issue 6 months ago • 1 comments

This PR contains the Bernoulli sampling theorem as described in the paper:

Formalizing concentration inequalities in Rocq: infrastructure and automation

by Reynald Affeldt, Alessandro Bruni, Cyril Cohen, Pierre Roux, Takafumi Saikawa

Much of the code leading up to this formalization has already been integrated through various PRs in MathComp and MathComp-Analysis where appropriate. See Figure 2 of the paper for a detailed explanation of the formalized theories developed for this PR, and the list below for references to the merged PRs.

Structure of the proof

The proof is inspired by Rajani's pen and paper proof.

One key lemma is mmt_gen_fun_expectation, which establishes the expectation of the moment generating function of a Bernoulli trial using the product probability measure. Then follows the proof bernoulli_trial_mmt_gen_fun, which establishes that the moment generating function of a Bernoulli trial is the product of each moment generating function.

One key step in the above proofs is to show:

$$ \mathbb E_{\otimes_n P}[\prod_{i < n} X_i] = \prod_{i < n} \mathbb E_P[X_i] $$

i.e. that the expectation of the product of $n$ random variables on the power measure of P is the product of the expectations of each variable with probability measure P.

To prove our final results, we need to establish a sequence of key analytical lemmas, namely:

exp2_le8: $e^2 \le 8$, inequality solved by using CoqInterval.
xlnx_lbound_i01: lower bound for $x \cdot \ln(x)$ in the interval $]0, 1[$.
xlnx_ubound_i1y: upper bound for $x \cdot \ln(x)$ for $x > 1$.

The proof itself is split into a sequence of intermediate (concentration) inequalities:

sampling_ineq1: Concentration inequality on a Bernoulli trial $X$, bounding the probability of $X \ge (1+\delta) \mathbb E_{\otimes_n P}[X]$.
sampling_ineq2: Specialization of sampling_ineq1 using xlnx_lbound_i12
sampling_ineq3: Concentration inequality on a Bernoulli trial $X$, bounding the probability of $X \le (1-\delta) \mathbb E_{\otimes_n P}[X]$
sampling_ineq4: Combines the previous two inequalities to obtain a bound on the probability of $|X - \mathbb E_{\otimes_n P}[X]| \ge \delta \mathbb E_{\otimes_n P}[X]$

Finally, sampling is the main sampling theorem combining the above inequalities.

Notes on the current state of this PR

Much of this formalization has already been integrated in MathComp and MathComp-Analysis, and the goal of the current PR is to serve as a compendium to the paper. A few integration tasks currently remain:

first is the instantiation of semi-norms for Lp spaces, which, due to MathComp-Analysis aiming to be compatible with at least two version of MathComp, will not be integrated until support is dropped for MathComp 2.3. PR #1230 is meant to track the experiment of integrating Lp spaces in MathComp-Analysis with semi-norms, and is only compatible with MathComp 2.4 onwards.
second is that the theory of iterated product measures (here called power_measure), is not yet integrated into MathComp-Analysis as of version 1.12. The plan is to merge it into the next release.

As these elements get integrated into the main branches, this PR will shrink to contain only the Bernoulli sampling theorem.

PRs leading up to this one

This is a (somewhat complete) list of PRs to Analysis that have been branched out of this sampling theorem:

#1680
#1678
#1661
#1658
#1657
#1655
#1654
#1651
#1650
#1648
#1624
#1618
#1607
#1606
#1604
#1602
#1600
#1599
#1598
#1597
#1547
#1525
#1524
#1522
#1494
#1053
#1011
#1000
#942
#939
#873

In additions to these PRs, this development contributed the following commits and PRs to MathComp:

https://github.com/math-comp/math-comp/commit/5b293e7fbe1593cefd54af9287a0e22c57b894a2 (seminorm interface)
https://github.com/math-comp/math-comp/commit/23ebefed36ca656a3484c20665433b039b4f8a29 (interval inference)
https://github.com/math-comp/math-comp/pull/1380

Jun 19 '25 15:06 hoheinzollern

CI green

Jul 09 '25 12:07 proux01