lean4 icon indicating copy to clipboard operation
lean4 copied to clipboard

fix: don't write padding bits to olean files

Open digama0 opened this issue 1 year ago • 4 comments

With USE_GMP=OFF, the implementation of mpz used a bool next to a size_t, resulting in 7 bytes of padding that are faithfully copied into the .olean files on affected systems, and resulting in nondeterminism in lake hashes. (This is not a problem for the mathlib cache, because the CI machines are running Linux x86, which enables GMP.)

PS: Rather than this representation, it would be nice if we could use

struct mpz {
  uint32_t capacity;
  int32_t sign_size;
  mpn_digit * m_digits;
};

(where capacity == abs(sign_size)) for ABI compatibility between GMP and non-GMP variants of lean built on the same system. Otherwise one has to know whether GMP is enabled or not in order to encode/decode these files (I'm going to implement this in leantar using the architecture, but if we decide to change the GMP setting at some point this will be an ABI break with no indicator).

digama0 avatar Nov 18 '23 01:11 digama0

  • ❗ Mathlib CI will not be attempted unless you rebase your PR onto the 'nightly' branch. (2023-11-19 05:26:41)
  • ✅ Mathlib branch lean-pr-testing-2908 has successfully built against this PR. (2023-11-19 08:07:40) View Log

@leodemoura Do you agree with the change in principle? I believe it can save us from major headaches down the road, though the comment definitely has to be extended!

Kha avatar Nov 24 '23 11:11 Kha

I could also use bool m_size; uint8_t m_padding[7]; there if preferred (although I guess that should be 3 on 32-bit systems).

An alternative fix is to zero the padding bits in the compactor after writing the struct. This is a bit annoying to write though since we have to manually calculate the field offsets (and this is architecture dependent as mentioned). I think copying the struct one field at a time instead of memcpy'ing the whole thing should also ensure the padding is zero (as the backing buffer is initially zeroed).

digama0 avatar Nov 25 '23 05:11 digama0

An alternative fix is to zero the padding bits in the compactor after writing the struct.

Another solution is to copy across the fields one-by-one, since the memory was zeroed to start with:

--- a/stage0/src/runtime/compact.cpp
+++ b/stage0/src/runtime/compact.cpp
@@ -278,7 +278,10 @@ void object_compactor::insert_mpz(object
     size_t data_sz = sizeof(mpn_digit) * to_mpz(o)->m_value.m_size;
     size_t sz      = sizeof(mpz_object) + data_sz;
     mpz_object * new_o = (mpz_object *)alloc(sz);
-    memcpy(new_o, to_mpz(o), sizeof(mpz_object));
+    // `new_o->m_value` has padding, which we want to leave as zero
+    memcpy(new_o, o, sizeof(lean_object));
+    new_o->m_value.m_sign = to_mpz(o)->m_value.m_sign;
+    new_o->m_value.m_size = to_mpz(o)->m_value.m_size;

eric-wieser avatar Aug 19 '24 23:08 eric-wieser