feat: add BitVec container operations
This PR adds fold operations and container-style operations for BitVec, following patterns from List and Array. This includes fold operations (foldr, foldl, foldrIdx, foldlIdx with cons theorems), conversion operations (ofFn, toList, toArray, toVector), counting operations (popcount, zerocount, countP, countIdxP), transformation operations (map, mapIdx, zipWith), and distance/similarity operations (dot, hammingDist, parity).
Related Zulip discussion: https://leanprover.zulipchat.com/#narrow/channel/217875-Is-there-code-for-X.3F/topic/Why.20isn't.20there.20code.20for.20.60BitVec.2Efoldr.60.3F
New modules
Init.Data.BitVec.Folds(additions): AddedfoldrIdx_nil,foldrIdx_cons,foldlIdx_nil,foldlIdx_constheorems for indexed foldsInit.Data.BitVec.OfFn: Conversion operations (ofFn,toList,toArray,toVector,ofArray,ofVector)Init.Data.BitVec.Count: Counting operations (popcount,zerocount,countP,countIdxP)Init.Data.BitVec.Map: Transformation operations (map,mapIdx,zipWith)Init.Data.BitVec.Hamming: Distance and similarity operations (dot,hammingDist,parity)
Implementation details
- All operations include theorems and lemmas
- Added induction principle for BitVec treating it as a cons-based structure
- Added cons theorems for indexed fold operations (
foldrIdx,foldlIdx) - Test coverage in
tests/lean/run/bitvec_container.lean(60+ test cases)
Notes
- I'm open to working on golfing or profiling some of the proofs, but wanted to get your opinion on whether this is all worth adding to stdlib
- Perhaps some modules like
Hamming.leanshould go to mathlib4 instead?
Mathlib CI status (docs):
- ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlibbranch. Trygit rebase e39894e62d363322fa18af76e12e54799496e2c0 --onto 41356740210745d6b9735cb779fbd0f9eee591e5. You can force Mathlib CI using theforce-mathlib-cilabel. (2025-11-23 23:01:20) - ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlibbranch. Trygit rebase e39894e62d363322fa18af76e12e54799496e2c0 --onto f2e191d0afbb5adcc6f12e1779c8b141fc9af996. You can force Mathlib CI using theforce-mathlib-cilabel. (2025-11-24 21:14:16) - ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the
nightly-with-mathlibbranch. Trygit rebase e39894e62d363322fa18af76e12e54799496e2c0 --onto b0e6db322421ce368999f08870fd747e45242f86. You can force Mathlib CI using theforce-mathlib-cilabel. (2025-11-27 07:13:34)
Reference manual CI status:
- ❗ Reference manual CI will not be attempted unless your PR branches off the
nightly-with-manualbranch. Trygit rebase e39894e62d363322fa18af76e12e54799496e2c0 --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using theforce-manual-cilabel. (2025-11-23 23:01:21)
I think you should have reverse induction (inducting from the least significant bit) as well. This would use BitVec.concat.
@linesthatinterlace That's a good idea! I'm working on adding it at the moment. But do you mean *from the most significant bit?
No, I originally said that, but I think actually cons is defined from the most significant bit because BitVec generally takes a big-endian approach.
/-- Append a single bit to the end of a bitvector, using big endian order (see `append`).
That is, the new bit is the least significant bit. -/
@[expose]
def concat {n} (msbs : BitVec n) (lsb : Bool) : BitVec (n+1) := msbs ++ (ofBool lsb)
// ...
/--
Prepends a single bit to the front of a bitvector, using big-endian order (see `append`).
The new bit is the most significant bit.
-/
@[expose]
def cons {n} (msb : Bool) (lsbs : BitVec n) : BitVec (n+1) :=
((ofBool msb) ++ lsbs).cast (Nat.add_comm ..)
If for concat the new bit becomes the LSB, then we started with (processed first) the MSBs, and vice versa for cons?
I was able to get a native popcount implementation that uses either compiler builtins or GMP (or naive C++), depending on size (and GMP availability). I did some benchmarking with and without this implementation, and for smaller vectors (64 bits), the difference is 10x improvment (2056ns vs 240ns avg execution), but for larger vectors the difference is even more noticeable (205x, 50531ns vs 246ns).
I'm also by no means a C++ expert, so any feedback is welcome!