xls icon indicating copy to clipboard operation
xls copied to clipboard

Test timeouts when building Docker image

Open sarahec opened this issue 3 years ago • 1 comments

When building a Docker Image (docker build . -f Dockerfile-ubuntu-22.04), two tests time out at 300s each: //xls/ir:function_test and //xls/ir:node_test. For reference, the next longest test -- //xls/fuzzer:fuzz_coverage_test -- completes in 49.9s.

When I change ir:node_test to "large", it passes in 600.9s and ir:function_test times out at 900s.

The machine: Linux 5.15, Ryzen 5950X (12C/24T), 32GB RAM, and no other significant processes running. The load average when running the function_test is around 0.5, and memory usage is under 5%. This makes me wonder if the problem is that it's looking for something on the network and not finding it.

sarahec avatar Jun 17 '22 19:06 sarahec

Hah. ir:function_test passes in 900.8s when given an eternal timeout.

Those times seem suspicious to me -- 600.9s and 900.8s -- as if a deadlock was resolved that allowed the tests to complete.

sarahec avatar Jun 17 '22 20:06 sarahec

That's interesting -- sorry for the delayed reply and thanks for reporting @sarahec

After those targets have compiled and are running tests there should not be network I/O. I wonder if we can force our docker configuration to deny any outbound connections before running the test step. (We have to pull packages for the bazel build step, but should not for the test step.) If you happen to know the docker-fu to make that happen we'd be happy to take a PR.

I'm glad switching to eternal fixed it, but agree those times are suspicious, especially given the machine specs! Not to works-on-my-machine you, but I kicked off a docker build on my machine to see what it's looking like on my side, will let you know.

cdleary avatar Oct 28 '22 00:10 cdleary

Snippets from my docker run output:

//xls/ir:function_test                                                   PASSED in 1.9s
//xls/ir:node_test                                                       PASSED in 2.8s
//xls/fuzzer:run_fuzz_test                                               PASSED in 149.3s  

So something definitely seems odd in your run. I was hoping container tech would mostly fix reproducibility questions like this! :-) My box is dual socket E5-2680 but I do have 64GiB of RAM and I turn sudo swapoff -a -- is it possible some of these ended up swapping due to concurrency?

I'm going to close as cannot repro for now, but let us know (by filing or reopening) if you see any further issues -- there's also the bazel --jobs flag that can be used to serialize the concurrency of test execution.

cdleary avatar Oct 28 '22 20:10 cdleary