rust
rust copied to clipboard
Set up an automated valgrind test
Not usually necessary for Rust, but since we're doing a lot of unsafe stuff with pointers, we should have a valgrind test. Ideally, Travis would run it and fail the build if there are issues.
It is always a good idea to run code/tests through valgrind. One could simply run the current tests (without docs) through valgrind, and add more that use yet uncovered code path.
Something interesting too would be to add code coverage to make sure as much as possible is covered.
And then even fuzzing; I've heard good things of cargo fuzz
.
Note that jemalloc removed valgrind support (see https://github.com/jemalloc/jemalloc/commit/9a8add1510456464bc496320990ec234798bd381) but this does not seems to be included in rust's version of a jemalloc. Rust uses jemalloc @ 33184bf69813087bf1885b0993685f9d03320c69 plus a couple of patches, which is jemalloc 4.1.0 plus four patches.
Well, I read this recent blog post on using Valgrind for this very purpose, and we are still advised to change the memory allocation system (using the nightly alloc_system
feature). Other than that, this is a useful resource to whoever is attempting to resolve this issue.
Great read, thanks.
Maybe an optional combination alloc_system
+nightly+valgrind could be added. As @adamcrume mentioned in #66, it might not be a good idea to depend on alloc_system
and nightly for the crate, but an option to increase the testing is definitely a good idea.
We might copy the suggestion at https://github.com/rust-lang/rust/issues/28224#issuecomment-281552903, which is to add a nightly
feature and add this to the code:
#![cfg_attr(feature="nightly", feature(alloc_system))]
#[cfg(feature="nightly")]
extern crate alloc_system;
Then we can build on the nightly channel with --features=nightly
and run valgrind against that.
So I've experimented running the tests through valgrind, but it's not as trivial as it looks.
First, as mentioned above, using jemalloc (the default memory allocator in rust) will confuse valgrind. So one has to build the tests using the system allocator, which is only available when using a nightly build of rust. This means Travis would need to be setup to run nightly, possibly in addition to stable rust.
Second (and most importantly), we don't want to run cargo through valgrind, but the tests binaries. This means the command used to run the tests should not be something like valgrind cargo test
because valgrind will catch cargo errors, not the bindings' ones. Suppressions could be written but this is a tedious task.
"Normally" people would run valgrind directly on the test binaries, for example valgrind tensorflow_rust.git/target/debug/deps/lib-aab328fe9f627d08
. This has two problems:
- How to programatically extract the filename? There is a couple of these binaries in the target directory. We'd need to find them and execute them. This complexifies the
.travis.yml
file. - The binaries are linked dynamically to
libtensorflow.so
and so must find them at run time. Simply running the binaries will fail witherror while loading shared libraries: libtensorflow.so: cannot open shared object file: No such file or directory
ifLD_LIBRARY_PATH
is not set properly.
Cargo will set a working environment for the binaries so it can run them. It also knows exactly what to run. So it's sad that can't directly run cargo valgrind test
or something like that.
Instead of using valgrind, we could try out the different sanitizers as provided by llvm. See here for more information: https://github.com/japaric/rust-san With a nightly rust newer than a month old, the sanitizers are included. Using them is quite easy and should work I think in Travis. On my machine for example:
$ cd tensorflow-sys
$ RUSTFLAGS="-Z sanitizer=leak" cargo +nightly test -vv -j 1 --target x86_64-unknown-linux-gnu -- --nocapture
[...]
=================================================================
==28527==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x55620849f0ea (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x400ea)
#1 0x55620846d0ac (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe0ac)
#2 0x55620846d020 (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe020)
#3 0x55620846d1bd (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe1bd)
#4 0x55620847b5be (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x1c5be)
#5 0x556208470040 (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x11040)
#6 0x5562084d8aaa (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
#7 0x55620846f680 (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x10680)
#8 0x5562084d8aaa (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
#9 0x5562084762f6 (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x172f6)
#10 0x5562084d0924 (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x71924)
#11 0x7ffbbbee02e6 (/usr/lib/libpthread.so.0+0x72e6)
SUMMARY: LeakSanitizer: 4 byte(s) leaked in 1 allocation(s).
error: test failed
when adding a simple std::mem::forget(Box::new(42));
to a test to force a memory leak.
This is not perfect; the implementation is not bug free nor finished. I'm also not seeing where the allocation was performed (as shown in the link example) but at least it's catching it.
What I think should be done is to first extend the Travis configuration to allow stable/beta/nightly rust. This would allow adding more testing pieces, like sanitizers or valgrind. I'll open a PR about this.
I'd like to see how a compiler-based sanitizer works when interacting with code it didn't compile, i.e. what happens when we allocate something through libtensorflow.so and then don't release that.
I assume that the offsets in that stack trace could be traced back to a line of code via nm
voodoo. Maybe they'll build that into the sanitizer at some point.