rust icon indicating copy to clipboard operation
rust copied to clipboard

Set up an automated valgrind test

Open adamcrume opened this issue 7 years ago • 7 comments

Not usually necessary for Rust, but since we're doing a lot of unsafe stuff with pointers, we should have a valgrind test. Ideally, Travis would run it and fail the build if there are issues.

adamcrume avatar Mar 10 '17 02:03 adamcrume

It is always a good idea to run code/tests through valgrind. One could simply run the current tests (without docs) through valgrind, and add more that use yet uncovered code path.

Something interesting too would be to add code coverage to make sure as much as possible is covered.

And then even fuzzing; I've heard good things of cargo fuzz.

nbigaouette-eai avatar Mar 14 '17 00:03 nbigaouette-eai

Note that jemalloc removed valgrind support (see https://github.com/jemalloc/jemalloc/commit/9a8add1510456464bc496320990ec234798bd381) but this does not seems to be included in rust's version of a jemalloc. Rust uses jemalloc @ 33184bf69813087bf1885b0993685f9d03320c69 plus a couple of patches, which is jemalloc 4.1.0 plus four patches.

nbigaouette-eai avatar Mar 14 '17 00:03 nbigaouette-eai

Well, I read this recent blog post on using Valgrind for this very purpose, and we are still advised to change the memory allocation system (using the nightly alloc_system feature). Other than that, this is a useful resource to whoever is attempting to resolve this issue.

Enet4 avatar Mar 14 '17 22:03 Enet4

Great read, thanks.

Maybe an optional combination alloc_system+nightly+valgrind could be added. As @adamcrume mentioned in #66, it might not be a good idea to depend on alloc_system and nightly for the crate, but an option to increase the testing is definitely a good idea.

nbigaouette-eai avatar Mar 15 '17 00:03 nbigaouette-eai

We might copy the suggestion at https://github.com/rust-lang/rust/issues/28224#issuecomment-281552903, which is to add a nightly feature and add this to the code:

#![cfg_attr(feature="nightly", feature(alloc_system))]
#[cfg(feature="nightly")]
extern crate alloc_system;

Then we can build on the nightly channel with --features=nightly and run valgrind against that.

adamcrume avatar Mar 15 '17 01:03 adamcrume

So I've experimented running the tests through valgrind, but it's not as trivial as it looks.

First, as mentioned above, using jemalloc (the default memory allocator in rust) will confuse valgrind. So one has to build the tests using the system allocator, which is only available when using a nightly build of rust. This means Travis would need to be setup to run nightly, possibly in addition to stable rust.

Second (and most importantly), we don't want to run cargo through valgrind, but the tests binaries. This means the command used to run the tests should not be something like valgrind cargo test because valgrind will catch cargo errors, not the bindings' ones. Suppressions could be written but this is a tedious task.

"Normally" people would run valgrind directly on the test binaries, for example valgrind tensorflow_rust.git/target/debug/deps/lib-aab328fe9f627d08. This has two problems:

  1. How to programatically extract the filename? There is a couple of these binaries in the target directory. We'd need to find them and execute them. This complexifies the .travis.yml file.
  2. The binaries are linked dynamically to libtensorflow.so and so must find them at run time. Simply running the binaries will fail with error while loading shared libraries: libtensorflow.so: cannot open shared object file: No such file or directory if LD_LIBRARY_PATH is not set properly.

Cargo will set a working environment for the binaries so it can run them. It also knows exactly what to run. So it's sad that can't directly run cargo valgrind test or something like that.

Instead of using valgrind, we could try out the different sanitizers as provided by llvm. See here for more information: https://github.com/japaric/rust-san With a nightly rust newer than a month old, the sanitizers are included. Using them is quite easy and should work I think in Travis. On my machine for example:

$ cd tensorflow-sys
$ RUSTFLAGS="-Z sanitizer=leak" cargo +nightly test -vv -j 1 --target x86_64-unknown-linux-gnu -- --nocapture
[...]
=================================================================
==28527==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x55620849f0ea  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x400ea)
    #1 0x55620846d0ac  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe0ac)
    #2 0x55620846d020  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe020)
    #3 0x55620846d1bd  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe1bd)
    #4 0x55620847b5be  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x1c5be)
    #5 0x556208470040  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x11040)
    #6 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #7 0x55620846f680  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x10680)
    #8 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #9 0x5562084762f6  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x172f6)
    #10 0x5562084d0924  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x71924)
    #11 0x7ffbbbee02e6  (/usr/lib/libpthread.so.0+0x72e6)

SUMMARY: LeakSanitizer: 4 byte(s) leaked in 1 allocation(s).
error: test failed

when adding a simple std::mem::forget(Box::new(42)); to a test to force a memory leak.

This is not perfect; the implementation is not bug free nor finished. I'm also not seeing where the allocation was performed (as shown in the link example) but at least it's catching it.

What I think should be done is to first extend the Travis configuration to allow stable/beta/nightly rust. This would allow adding more testing pieces, like sanitizers or valgrind. I'll open a PR about this.

nbigaouette-eai avatar Mar 20 '17 15:03 nbigaouette-eai

I'd like to see how a compiler-based sanitizer works when interacting with code it didn't compile, i.e. what happens when we allocate something through libtensorflow.so and then don't release that.

I assume that the offsets in that stack trace could be traced back to a line of code via nm voodoo. Maybe they'll build that into the sanitizer at some point.

adamcrume avatar Mar 21 '17 02:03 adamcrume