core-dump-handler
core-dump-handler copied to clipboard
Allow to disable ZIP compression & fix bug in overwriting info files
This MR fixes two issues seen in a production environment:
- We have the problem that big core dumps were corrupted which can be fixed by using
io::copy
instead of the manual read/write logic. It's unclear to me why because I am not a Rust guru. Another problem with big core dumps was that they turned out to be corrupted when compression is enabled. The-D
flag works well for that and just stores data as-is in the resulting ZIP file. - The
counter
variable to gather information on containers was never incremented thus overwriting previous container information.
Thanks for this PR. My initial reaction is that if the zip is corrupting the core dumps then it should just be turned off rather than having a flag.
That said I'd like to investigate it further if possible. Do you know the the size of file that is failing along with the OS of the host so I can try to reproduce?
Also the merge is blocked as cargo fmt
needs to be ran on the code.
Thanks for the comment! I just ran cargo fmt
and updated the PR.
Your reaction is completely right. I just though I wouldn't want to disable it for everybody because for some sizes it probably works - otherwise I would have suspected much more bug reports. In my case it is a 8GB core dump on an Ubuntu 22.04 machine.
Thanks for the update. Let me see if I can reproduce this over the weekend and get back to you.
Hi there, are there any updates on getting this reproduced on your end? Let me know if you need more information!
Haven't had the bandwidth to look at it yet. Too much happening at this time of year. If you have some bandwidth can you create a large core file (could be just random data) and replace test.core then run the tests to see if it reproduces? Thanks
I followed your idea and ran the tests locally and I can produce a failed unit test with a random core file of size 512M:
base64 /dev/urandom | head -c 512M >./core-dump-composer/mocks/test.core
Test output:
---- default_scenario stdout ----
The current directory is <...>
Running tests using this PATH: <...>
timeout
ReadDir("./output")
thread 'default_scenario' panicked at 'assertion failed: `(left == right)`
left: `8`,
right: `1`', core-dump-composer/tests/default.rs:162:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
default_scenario
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 120.01s
error: test failed, to rerun pass '-p core-dump-composer --test default'
A core file of 256M was working fine BTW.
As a side note: for the tests I always needed to clear the folder core-dump-composer/output
otherwise I was getting some weird hickup in the test. Probably not a problem for CI but just wanted to mention it.
Excellent thanks for looking into this and providing reproduction steps. The hickup is intentional for a failed test. It's good to have the failed artifacts when it breaks but it should clean up if it's successful - I'll check it when I look into it.
Also: if I change the tests to use the new -D
flag introduce with my MR the testcases run through with a 512MB test.core
.
Found the problem - The default time out for the thread is 120 seconds and the zip for 512M is taking longer than that. I've bumped the timeout to 600 seconds and the tests now pass with the 512M Can you set the default time out to 600 or 10 mins as part of this PR please. https://github.com/IBM/core-dump-handler/blob/main/core-dump-composer/src/config.rs#L62
Given the impact of compression I think having the option to disable it is a good feature to have so I'm happy to land this PR.
@timbuchwaldt any additional thoughts on the timeout setting change?
Okay perfect, thanks for your help! I just pushed a commit that changes the default for the timeout parameter.
Thanks for the change. Can you apply the signoff for DCO please Instructions in this link. https://github.com/IBM/core-dump-handler/pull/114/checks?check_run_id=9292246336
I'll look to merge this over the weekend.
Yes, sorry, done! Thank you a lot for taking care!
One question: is there a plan when this going to be released in a new minor release? Thanks.
Hey @stonemaster I've made some changes to this PR in order to make it configurable from the chart. These are all rolled up into https://github.com/IBM/core-dump-handler/pull/116 If the changes are ok for you I'll roll a release. You should be able to validate that release by checking out the pr and running it as the image will also be built