pistache
pistache copied to clipboard
No crashes in /var/crash after SEGFAULT on CI
Hello,
I have noticed that there are no crashes in /var/crash
after Segmentation fault in unit tests. For instance:
- https://travis-ci.org/oktal/pistache/jobs/655471456 At the end of log I see the following output:
...
$ ls -lta /var/crash
total 8
drwxr-xr-x 14 root root 4096 Feb 24 20:58 ..
drwxrwxrwt 2 root root 4096 Feb 18 22:04 .
- https://travis-ci.org/oktal/pistache/jobs/654598559
...
$ ls -lta /var/crash
ls: cannot access '/var/crash': No such file or directory
Is it possible to get crash logs on CI ?
Interesting. I never noticed that at the bottom before. I wonder if Travis is changing or setting the "ulimit -c" on the container in a way that is different from when that functionality was first introduced.
Consider: https://github.com/springmeyer/travis-coredump/blob/master/.travis.yml
Interesting. I never noticed that at the bottom before. I wonder if Travis is changing or setting the "ulimit -c" on the container in a way that is different from when that functionality was first introduced.
Consider: https://github.com/springmeyer/travis-coredump/blob/master/.travis.yml
@dennisjenkins75 Hello, Travis CI doesn't run anymore?
Strange. [1]
Shows that its been running, as recently as an hour ago, but I do not always see the results on the github page for each PR. It ran for PR #723, but skipped running for PR #735. Most of the recent travis runs have failed; either the build was legitimately broken, or "apt-get" failed and travis did not retry it. I'm not a travis expert; not sure why its acting weird. But I need to pay attention to if the travis run passes or not and I should stop approving PRs unless I get a clear "green" signal from travis.
[1] https://travis-ci.org/github/oktal/pistache/builds?utm_medium=notification&utm_source=github_status
I've just had a quick look at travis (with all my zero knowledge on this CI), since the current build is segfaulting on a couple of tests and there isn't much in the way of backtraces - eg. https://travis-ci.org/github/oktal/pistache/jobs/692742761#L6052
It isn't clear which directory the after_failure script is being run in, but it doesn't look like it's the directory with the coredump - could this be because the build carries on after the failure and changes directory from Build-Debug-nossl (where the segfault happened) into Build-Release?
Either way, it might be worth adding a cd ${TRAVIS_BUILD_DIR}
to the after_failure script and swapping the gdb command to something closer to the travis-coredump example, such as for i in $(find ./ -maxdepth 2 -name 'core*' -print); do echo "Coredump $i"; gdb -c $i -ex "thread apply all bt" -ex "set pagination 0" -batch; done;
, with maxdepth increased to search the subfolders.
What do you think?
I've opened a draft PR here to see what happens: https://github.com/oktal/pistache/pull/776
I think I've got a fix for this issue in PR #776 - it turned out to be a combination of things:
- Core dumps are passed to apport due to
/proc/sys/kernel/core_pattern
- Not all arches allow modifying the core_pattern, so it's not possible to consistently write the core dump somewhere else.
- apport requires specific configuration to write crash reports for unrecognised software (ie anything that isn't directly provided by the Ubuntu repos)
- apport writes crash reports instead of core dumps to /var/crash/, which have a different filename and need unpacking before GDB can read them.
- GDB needs a copy of the executable that generated the core dump to get debug symbols (the filename is given in the crash report, so that isn't a massive problem).
- The Travis job matrix isn't really suited for running jobs with different versions of the same compiler, and was structured so that the three non-AMD64 arches were being built without any specific configuration or packages (including apport)
The changes made in the PR modify the build process to use apport to capture crash reports, and then extracts them and feeds them through GDB to generate a backtrace. It might be useful to add some debuginfo packages to the build images to get some more symbols in there, but it's certainly an improvement on no backtrace.
Unfortunately, the Arm64 build configuration doesn't seem to accept having its core_pattern set to pass the core dump to apport - I'm fiddling to see if I can fix it at the moment.
Also, apologies for using so much build server time today.