modin
modin copied to clipboard
Speed up tests
Current goals:
- < 45 min on PR
- < 20 concurrent jobs on PR (enterprise tier of github actions has a limit of 180 concurrent jobs)
Current state (for dd27013 in master):
- ~ 50 min
- 24 parallel jobs
Further improvements should be for:
-
test_series.py
-
test_groupby.py
-
test+default.py
pytest modin/pandas/test/
takes me 2.5 hours locally (the pandas test suite with --skip-slow --skip-db
takes 23 minutes).
Of this, test_join_sort takes 55 minutes, test_groupby takes 37, test_series takes 18, test_indexing takes 17
@jbrockmendel that local command only runs tests with the ray. It doesn't test the dask engine at all and skips many tests in the CI. Here is an example recent successful CI run. The whole thing took 1h 17 minutes. Note that the CI benefits from parallelism across tests.
Of this, test_join_sort takes 55 minutes, test_groupby takes 37, test_series takes 18, test_indexing takes 17
I know test_join_sort
has way too many test cases.
In this vein, test-compat-win (engine dask, python 3.6) has been notoriously slow in CI for some time now.
One thing that we could do to make CI faster for test-compat-win is to launch parallel jobs of pytests similar to what we do for test-windows right now.
So I had analyzed the current state of CI a little, and found that we now have 37 tasks on each CI run for PRs.
We have some tasks which are running for more than 1.5 hours each! And quite a few of tasks running longer than an hour.
Table of tasks ordered by run time
Job | Run time, minutes |
---|---|
test-windows (engine ray, python 3.8, group_3) | 109,17 |
test-windows (engine dask, python 3.8, group_3) | 104,98 |
test-windows (engine ray, python 3.8, group_4) | 93,65 |
test-windows (engine dask, python 3.8, group_4) | 85,08 |
test-asv-benchmarks | 78,17 |
test-ubuntu (engine unidist mpi, python 3.8) | 62,42 |
test-ubuntu (engine dask, python 3.8, group_4) | 50,22 |
test-ubuntu (engine dask, python 3.8, group_3) | 49,5 |
test-windows (engine ray, python 3.8, group_1) | 44,12 |
test-windows (engine dask, python 3.8, group_1) | 43,55 |
test-ubuntu (engine ray, python 3.8, group_4) | 39,28 |
test-windows (engine dask, python 3.8, group_2) | 38,18 |
test-ubuntu (engine dask, python 3.8, group_1) | 34,55 |
test-ubuntu (engine ray, python 3.8, group_1) | 33,78 |
Test HDK storage format, Python 3.8 | 33,35 |
test-ubuntu (engine dask, python 3.8, group_2) | 28,3 |
test-windows (engine ray, python 3.8, group_2) | 23,68 |
test-ubuntu (engine python, python 3.8, group_1) | 23,42 |
Test BaseOnPython execution, Python 3.8 | 18,48 |
test-ubuntu (engine ray, python 3.8, group_3) | 16,95 |
test-ubuntu (engine ray, python 3.8, group_2) | 11,28 |
test experimental | 9,55 |
test-internals | 7,5 |
lint (mypy) | 5,5 |
test-spreadsheet (engine ray, python 3.8) | 5,03 |
test cloud | 4,65 |
test (pyarrow, python 3.8) | 4,62 |
test-no-engine | 4,07 |
test api | 3,98 |
test-headers | 3,95 |
test-spreadsheet (engine dask, python 3.8) | 3,95 |
lint (pydocstyle) | 2,73 |
test-clean-install-windows | 2,53 |
test-clean-install-ubuntu | 1,28 |
lint (black) | 0,37 |
lint (flake8) | 0,35 |
upload-coverage | 0,32 |
My proposal would be to differentiate what we run on PRs vs what we run on commits to master
and release/*
branches (right now we run the same set of tests plus some extra tests on master like testing ray-client).
To sum up, I propose the following changes to CI runs for pull requests:
- Drop the most expensive tests like
groupby
,sort
andjoin
on Ray, Dask and Unidist on Linux- Run those anyway if special files concerning Ray/Dask/Unidist were touched (
modin/execution/ray
etc)
- Run those anyway if special files concerning Ray/Dask/Unidist were touched (
- Drop almost all the tests except probably the "super-sanity" set (
dataframe/test_map_metadata.py
,test_series.py
,test_io.py
) on Ray and Dask on Windows - Add running the whole test suite on
PandasOnPython
backend on Windows - Think more what we run regarding ASV - whether we really care about that in PRs, or maybe we can drop this from PRs and instead run this on merges to master comparing current and previous commit or something similar? cc @anmyachev
Note: I'm not proposing to apply this to merges to master
or release branch! That CI should stay the same, testing the whole suite.
Expected effects:
- Advantages: the run time of the CI on pull requests should be much shorter, maybe we would be able to join back all the "groups" we introduced a while back. cc @dchigarev
-
Disadvantages: this slightly increases risk of merging something that would break in the cases we stopped testing (like breaking in
groupby
on Ray), and we would only know whenmaster
CI fails. This also means someone would have to watch on CI failures onmaster
closer than we're watching now. - Neutral: it seems that code coverage measurements in PRs would become completely moot, so maybe we can consider not gathering them and not failing on a certain threshold.
Another note: this PR https://github.com/modin-project/modin/pull/6267 seems to be a good example were testing only PandasOnPython
is not enough - we should try to keep at least some tests which are bumping into Ray peculiarities. Seems that test_map_metadata.py
is a good candidate.
Think more what we run regarding ASV - whether we really care about that in PRs, or maybe we can drop this from PRs and instead run this on merges to master comparing current and previous commit or something similar? cc @anmyachev
ASV tests are only run when there are changes in asv_bench
folder, which is quite rare.
Add running the whole test suite on
PandasOnPython
backend on Windows
@vnlitvinov do you mean BaseOnPython
execution? Or are we just not testing PandasOnPython
at all right now, so I don’t see it?
Drop almost all the tests except probably the "super-sanity" set (dataframe/test_map_metadata.py, test_series.py, test_io.py) on Ray and Dask on Windows
I guess it's ok to start with that and besides, it shouldn't affect coverage since we don't have windows specific code.
@vnlitvinov do you mean
BaseOnPython
execution? Or are we just not testingPandasOnPython
at all right now, so I don’t see it?
We aren't testing PandasOnPython
on Windows at all, and BaseOnPython
is a different thing, it cannot substitute testing for Ray or Dask because it uses different query compiler (and it's using a partitioning mechanism).
ASV tests are only run when there are changes in
asv_bench
folder, which is quite rare.
This isn't completely true, it still installs everything before it sees there were no changes and quits, which takes ~2 minutes. But yeah, it's a minor thing, I agree. Thanks for pointing it to me!