enaml
enaml copied to clipboard
.enamlc files are nondeterministic
While working on reproducible builds for openSUSE, I found that
our python-enaml 0.10.4 package varies from nondeterministic bits in .enamlc files.
/usr/lib64/python3.7/site-packages/enaml/workbench/ui/__enamlcache__/workbench_window.enaml-py37-cv26.enamlc differs at offset '655'
-00000280 20 da 0b 6d 61 6b 65 5f 6f 62 6a 65 63 74 63 01 | ..make_objectc.|
+00000280 20 da 0b 6d 61 6b 65 5f 6f 62 6a 65 63 74 e3 01 | ..make_object..|
00000290 00 00 00 00 00 00 00 03 00 00 00 0f 00 00 00 43 |...............C|
The entropy in there seems to only be 1 bit, so in 50% of the cases, 2 builds randomly have identical results. However, it should be easy to do 10 force-compiles of the relevant source like this:
for i in $(seq 1 10) ; do
$FORCECOMPILE
md5sum $ENAMLCFILE
done | sort | uniq -c
If everything was good, there should just be 1 line with 10 counts of the same md5.
See also https://reproducible-builds.org/ for why deterministic program behaviour is good.
I am surprised because for the offset to be so far in the file it would mean marshal is not deterministic. The code generating the cache is rather straightforward (see here https://github.com/nucleic/enaml/blob/master/enaml/core/import_hooks.py#L297). Alternatively the code could be unstable but that is weird too.
Ah, indeed I found problems with python marshal earlier: https://bugs.python.org/issue34033
I guess that as long as pyc are not reproducible enamlc won't be since they are extremely close. I will keep this open in the meantime.