zapcc icon indicating copy to clipboard operation
zapcc copied to clipboard

Segfaults at random locations

Open nh2 opened this issue 4 years ago • 1 comments

First off, thanks for zapcc, it seems like a big chunk of engineering.

Last week I tried to introduce it as an alternative compiler in my project, and did get integer factors speedup for incremental recompiles, just like i had hoped.

Unfortunately, I also found some problems that prevented me from using zapcc productively:

  • Nondeterministic compiler output if parallel builds are used. If I use more than -j1 on my build, then even adding a comment to a C++ file will result in changed .o files (it looks like sections are different).
  • Binaries that segfault sometimes. The binaries created by zapcc sometimes segfault at seemingly random, but reproducible locations. That is, a given binary produced by zapcc always crashes at the same place during my program execution. Adding some comments, and compiling again, sometimes creates a different binary that segfaults in a different location (but again reproducably so there).
    • This does not happen on plain clang++ 7.
    • Because of the previous nondeterminsim problem, it is extremely difficult to just diff the created binaries to try and spot what zapcc introduces that makes them crash.
    • Trying to use gdb on it does not help much; crashes happen deep in libraries I use, hinting that invlid memory is at play (also violating assertions about the data that always hold with clang or gcc).

My project is a medium-sized propritary C++ code base depending on eigen, ceres, CGAL and other large libraries, so it is unfortunately difficult for me to provide a reproducer without too much effort.

I just wanted to report this; perhaps you have some ideas of where the problem might be.

Also, i believe that making zapcc deterministic would be hugely beneficial, so that I could just diff the crashing and non-crashing binaries more easily.

nh2 avatar Feb 19 '20 20:02 nh2

zapcc is non-deterministic since it keeps state between compilations and may use it to benefit, for example zpacc can inline a function from a previously-compiled source file, very similar to link time optimizaion phase. It will rememeber the dependency on the other source code in such case. Even with -j1 the binary may not be identical depending upon compilation order.

The usual way to debug such a problem is to use creduce. We had done maybe 1000 reduces of similar problems. Even very, very big projects were reduced to 1-3 files of few lines each and then made into the zapcc regression tests, single files into the single directory and multi-file tests into multi. Take a look. The reduce process take several hours to several days to complete and requires some manual help where the human outsmarts creduce. The final manual reducing is sort of a C++ puzzle. With the final reduced example it's possible to start debugging zapcc and seeing what it does wrong.

yrnkrn avatar Feb 21 '20 09:02 yrnkrn