server icon indicating copy to clipboard operation
server copied to clipboard

Increase SERVER_TIMEOUT for L0_infer_valgrind

Open krishung5 opened this issue 3 years ago • 6 comments

Related PRs: common: https://github.com/triton-inference-server/common/pull/67 backend: https://github.com/triton-inference-server/backend/pull/67 tensorrt_backend: https://github.com/triton-inference-server/tensorrt_backend/pull/44

krishung5 avatar Jul 25 '22 19:07 krishung5

It shouldn't take this long (2 days) to load models, right? When did the test start timing out?

GuanLuo avatar Jul 29 '22 19:07 GuanLuo

@GuanLuo I don't expect the test to take 2 days long either. I set it to 2 days just in case it runs over the TIMEOUT and we need to restart it all over again. In the last run, the test got killed because there is a 8h timeout set in our CI pipeline configuration. I will modify the TIMEOUT once we have a more accurate test duration.

krishung5 avatar Jul 29 '22 19:07 krishung5

Removed onnx and python backends for valgrind tests. Added a possible memory leak introduced from OpenVINO side. Made some changes in OpenVINO backend.

krishung5 avatar Aug 02 '22 18:08 krishung5

Rebased

krishung5 avatar Aug 05 '22 17:08 krishung5

Although L0_infer_valgrind passes locally, it seems like running test_class_bbb test case with TF models on CI would fail due to timeout issue. Increased network_timeout for this case.

krishung5 avatar Aug 10 '22 16:08 krishung5

L0_infer_valgrind passes on CI: https://gitlab-master.nvidia.com/dl/dgx/tritonserver/-/jobs/42488617

krishung5 avatar Aug 11 '22 17:08 krishung5