zapcc
zapcc copied to clipboard
Segfaults at random locations
First off, thanks for zapcc, it seems like a big chunk of engineering.
Last week I tried to introduce it as an alternative compiler in my project, and did get integer factors speedup for incremental recompiles, just like i had hoped.
Unfortunately, I also found some problems that prevented me from using zapcc productively:
-
Nondeterministic compiler output if parallel builds are used. If I use more than
-j1
on my build, then even adding a comment to a C++ file will result in changed.o
files (it looks like sections are different). -
Binaries that segfault sometimes. The binaries created by zapcc sometimes segfault at seemingly random, but reproducible locations. That is, a given binary produced by zapcc always crashes at the same place during my program execution. Adding some comments, and compiling again, sometimes creates a different binary that segfaults in a different location (but again reproducably so there).
- This does not happen on plain clang++ 7.
- Because of the previous nondeterminsim problem, it is extremely difficult to just diff the created binaries to try and spot what zapcc introduces that makes them crash.
- Trying to use
gdb
on it does not help much; crashes happen deep in libraries I use, hinting that invlid memory is at play (also violating assertions about the data that always hold with clang or gcc).
My project is a medium-sized propritary C++ code base depending on eigen
, ceres
, CGAL
and other large libraries, so it is unfortunately difficult for me to provide a reproducer without too much effort.
I just wanted to report this; perhaps you have some ideas of where the problem might be.
Also, i believe that making zapcc deterministic would be hugely beneficial, so that I could just diff the crashing and non-crashing binaries more easily.
zapcc is non-deterministic since it keeps state between compilations and may use it to benefit, for example zpacc can inline a function from a previously-compiled source file, very similar to link time optimizaion phase. It will rememeber the dependency on the other source code in such case. Even with -j1
the binary may not be identical depending upon compilation order.
The usual way to debug such a problem is to use creduce. We had done maybe 1000 reduces of similar problems. Even very, very big projects were reduced to 1-3 files of few lines each and then made into the zapcc regression tests, single files into the single
directory and multi-file tests into multi
. Take a look.
The reduce process take several hours to several days to complete and requires some manual help where the human outsmarts creduce. The final manual reducing is sort of a C++ puzzle.
With the final reduced example it's possible to start debugging zapcc and seeing what it does wrong.