Turing.jl icon indicating copy to clipboard operation
Turing.jl copied to clipboard

Massive test time regressions

Open devmotion opened this issue 3 years ago • 2 comments

It seems #1766 introduced, possibly indirectly, massive test time regressions. For instance, the Turing-CI workflow with the previous commit on the master branch finished successfully in 1h 26mins (which still seems surprisingly slow compared with previous timings - IIRC at some point the test suite finished in < 50mins) but after #1766 was merged it took 3h 41min: https://github.com/TuringLang/Turing.jl/actions/workflows/TuringCI.yml?query=branch%3Amaster The PR uncommented some tests but this increase seems too large.

I noticed this issue (?) when running the integration tests in DynamicPPL: Recently they run for > 2 hours (https://github.com/TuringLang/DynamicPPL.jl/actions/workflows/IntegrationTest.yml) whereas some months ago it took < 50 mins (e.g., https://github.com/TuringLang/DynamicPPL.jl/runs/3558682402?check_suite_focus=true).

devmotion avatar Feb 11 '22 19:02 devmotion

DynamicPPL-specific tests also take more time, and went up from 10-15 mins to 22-28mins recently: https://github.com/TuringLang/DynamicPPL.jl/actions/workflows/CI.yml?query=branch%3Amaster

devmotion avatar Feb 11 '22 19:02 devmotion

Libtask incurs some performance regression - likely due to the current implementation is not optimal yet. Still, it is not hard to optimise it for better performance since it is fully functioning now. Another change in https://github.com/TuringLang/Turing.jl/pull/1766 is that the number of MCMC iterations is increased. Although these increases are relatively minor, repeated for several different AD backends could cause a visible test time increase.

yebai avatar Feb 11 '22 21:02 yebai

This is largely fixed by various updates to Libtask. It will get even better after https://github.com/TuringLang/Turing.jl/pull/1858 ix merged.

yebai avatar Nov 12 '22 20:11 yebai