tarpaulin icon indicating copy to clipboard operation
tarpaulin copied to clipboard

Running with --follow-exec makes tests very slow

Open Ch00k opened this issue 3 years ago • 10 comments

I am using tarpaulin with the --follow-exec option while running tests for my binary crate (a CLI app), and I see that the tests are incredibly slow compared to a run without the option. This demonstrates the behaivor:

$ cargo tarpaulin --test slow -- --show-output
Mar 01 15:32:49.394  INFO cargo_tarpaulin: Running Tarpaulin
Mar 01 15:32:49.394  INFO cargo_tarpaulin: Building project
   Compiling orthanc-cli v0.3.0 (/home/ay/projects/personal/orthanc-cli)
    Finished test [unoptimized + debuginfo] target(s) in 2.76s
Mar 01 15:32:52.315  INFO cargo_tarpaulin::process_handling::linux: Launching test
Mar 01 15:32:52.315  INFO cargo_tarpaulin::process_handling: running /home/ay/projects/personal/orthanc-cli/target/debug/deps/slow-a6c901d66227fc51

running 1 test
test test_slow ... ok

successes:

---- test_slow stdout ----
Test run time: 72


successes:
    test_slow

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Mar 01 15:32:52.547  INFO cargo_tarpaulin::report: Coverage Results:
|| Tested/Total Lines:
|| src/lib.rs: 0/272 +0%
|| tests/slow.rs: 12/12 +0%
|| tests/test.rs: 0/31 +0%
||
3.81% coverage, 12/315 lines covered, +0% change in coverage

$ cargo tarpaulin --test slow --follow-exec -- --show-output
Mar 01 15:32:59.651  INFO cargo_tarpaulin: Running Tarpaulin
Mar 01 15:32:59.651  INFO cargo_tarpaulin: Building project
   Compiling orthanc-cli v0.3.0 (/home/ay/projects/personal/orthanc-cli)
    Finished test [unoptimized + debuginfo] target(s) in 2.77s
Mar 01 15:33:02.587  INFO cargo_tarpaulin::process_handling::linux: Launching test
Mar 01 15:33:02.587  INFO cargo_tarpaulin::process_handling: running /home/ay/projects/personal/orthanc-cli/target/debug/deps/slow-a6c901d66227fc51

running 1 test
test test_slow ... ok

successes:

---- test_slow stdout ----
Test run time: 4134


successes:
    test_slow

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Mar 01 15:33:06.870  INFO cargo_tarpaulin::report: Coverage Results:
|| Tested/Total Lines:
|| src/cli.rs: 167/167
|| src/lib.rs: 15/275 +5.454545454545454%
|| src/main.rs: 15/221
|| src/utils.rs: 29/148
|| tests/slow.rs: 12/12 +0%
|| tests/test.rs: 0/31 +0%
||
27.87% coverage, 238/854 lines covered, +24.05932864949258% change in coverage

You can see how Test run time is 72 with no --follow-exec vs 4134 with follow-exec (times are in milliseconds). With the real tests the difference is 3 vs 18 minutes.

I have a branch that I used to create a minimal test: https://github.com/Ch00k/orthanc-cli/tree/slow-follow-exec https://github.com/Ch00k/orthanc-cli/blob/slow-follow-exec/Cargo.toml#L53-L55 is t he relevant Cargo.toml part

Happy to help debugging this. Is there anything I can do to profile it or something?

Ch00k avatar Mar 01 '21 14:03 Ch00k

Well when a binary is launched instead of detaching and letting it run at full speed, tarpaulin will grab the launch, instrument the executable with breakpoints which will involve parsing the executables dwarf tables (though this could be cached with a bit of a rearchitecture of things). And then trace the instructions in that binary.

So if your binary is running say the same volume of code as your tests directly run you could probably expect it to be best case 2x slower than without the option.

For profiling I guess a PR to solve this issue would help https://github.com/xd009642/tarpaulin/issues/686 I need to think of good ways to profile tarpaulin given the binary interactions add a significant layer of uncertainity

xd009642 avatar Mar 01 '21 15:03 xd009642

I would expect --follow-exec to be slower, but probably not 57x slower 😄 Back in the day when --follow-exec was first introduced (in the yanked 0.17 IIRC) I don't remember the tests being this slow.

Ch00k avatar Mar 01 '21 15:03 Ch00k

Yeah... I'll have a look at some point hopefully I can figure it out. Also back in the yanked 0.17 I was missing a number of checks to resolve the pid to the right executable that caused it to fail for most people. So probably don't want speed at the cost of that :sweat_smile:

xd009642 avatar Mar 01 '21 15:03 xd009642

So I tried running the orthanc-cli tests, setting the environment variables in the github actions and bringing up the docker-compose but I can't seem to get it so all the tests pass. Any tips?

xd009642 avatar Mar 05 '21 06:03 xd009642

You are probably missing the ORC_COMPLETION_OUTPUT_DIR environment variable. Try this:

export ORC_COMPLETION_OUTPUT_DIR=/tmp
cargo tarpaulin --test slow -- --show-output

You don't need any of the docker-compose stuff if running only the slow test set.

Ch00k avatar Mar 05 '21 08:03 Ch00k

Well this is fun, the slow test was very fast for me, I didn't notice a difference running it locally. I have to go out now but I'll have a further dig later today or tomorrow

xd009642 avatar Mar 05 '21 09:03 xd009642

Is there any difference if you run it as

cargo tarpaulin --test slow -- --show-output

vs

cargo tarpaulin --test slow --follow-exec -- --show-output?

Ch00k avatar Mar 05 '21 09:03 Ch00k

Okay used time instead of my perception and there is difference, it takes about 5s without --follow-exec and with it takes 8s. But that's not of the same level of magnitude as the difference you've seen between the two which suggests to me there's something else at play here :thinking:

xd009642 avatar Mar 05 '21 10:03 xd009642

Hm, that's interesting... I'll play with it some more, perhaps I can find what it is. Thanks for looking into it!

Ch00k avatar Mar 05 '21 11:03 Ch00k

I made a PR out of my branch, and added two jobs: with --follow-exec and withot --follow-exec: https://github.com/Ch00k/orthanc-cli/pull/69/checks With just one test the difference is not that noticeable, but it is nevertheless quite significant: 64ms vs. 4597ms. Inside the test itself there is a very dumb way of calculating the test run time: https://github.com/Ch00k/orthanc-cli/pull/69/files#diff-26b8d1dbbc79a7c080fe0575a80807a79b41cd122b7b36bb551b3968ccc4fa4fR22-R26

With the real tests the difference is 2 minutes (https://github.com/Ch00k/orthanc-cli/runs/2015233585?check_suite_focus=true) vs. 18 minutes (https://github.com/Ch00k/orthanc-cli/runs/2015233270?check_suite_focus=true). The magnitude of this difference is not as bad as 64 vs 4597, but it's still significant IMO.

Ch00k avatar Mar 06 '21 15:03 Ch00k

So since this was last discussed I added some ptrace changes that caused tests and binaries to properly run concurrently - it turned out ptrace gremlins meant things ran kinda serially but this lead to deadlocks in some tests that relied on the SIGCHLD signal. That did make things faster and removed a class of follow-exec errors but also added a rarer issue of spurious segfaults to some test execs :cry:

But now there's the --engine llvm which is approximately cargo test speeds and not ptrace based and is solving issues for some of those users as well.

With all this in mind I'm going to close this issue. But feel free to open a new issue if you have further issues etc

xd009642 avatar Oct 13 '22 17:10 xd009642