modin Speed up tests

Current goals:

< 45 min on PR
< 20 concurrent jobs on PR (enterprise tier of github actions has a limit of 180 concurrent jobs)

Sep 03 '20 19:09 anmyachev

Current state (for dd27013 in master):

~ 50 min
24 parallel jobs

Sep 09 '20 11:09 anmyachev

Further improvements should be for:

test_series.py
test_groupby.py
test+default.py

Sep 09 '20 12:09 anmyachev

pytest modin/pandas/test/ takes me 2.5 hours locally (the pandas test suite with --skip-slow --skip-db takes 23 minutes).

Of this, test_join_sort takes 55 minutes, test_groupby takes 37, test_series takes 18, test_indexing takes 17

Jul 14 '22 22:07 jbrockmendel

@jbrockmendel that local command only runs tests with the ray. It doesn't test the dask engine at all and skips many tests in the CI. Here is an example recent successful CI run. The whole thing took 1h 17 minutes. Note that the CI benefits from parallelism across tests.

Of this, test_join_sort takes 55 minutes, test_groupby takes 37, test_series takes 18, test_indexing takes 17

I know test_join_sort has way too many test cases.

Jul 15 '22 14:07 mvashishtha

In this vein, test-compat-win (engine dask, python 3.6) has been notoriously slow in CI for some time now.

Sep 07 '22 15:09 pyrito

One thing that we could do to make CI faster for test-compat-win is to launch parallel jobs of pytests similar to what we do for test-windows right now.

Sep 20 '22 22:09 pyrito

So I had analyzed the current state of CI a little, and found that we now have 37 tasks on each CI run for PRs.

We have some tasks which are running for more than 1.5 hours each! And quite a few of tasks running longer than an hour.

Table of tasks ordered by run time

Job	Run time, minutes
test-windows (engine ray, python 3.8, group_3)	109,17
test-windows (engine dask, python 3.8, group_3)	104,98
test-windows (engine ray, python 3.8, group_4)	93,65
test-windows (engine dask, python 3.8, group_4)	85,08
test-asv-benchmarks	78,17
test-ubuntu (engine unidist mpi, python 3.8)	62,42
test-ubuntu (engine dask, python 3.8, group_4)	50,22
test-ubuntu (engine dask, python 3.8, group_3)	49,5
test-windows (engine ray, python 3.8, group_1)	44,12
test-windows (engine dask, python 3.8, group_1)	43,55
test-ubuntu (engine ray, python 3.8, group_4)	39,28
test-windows (engine dask, python 3.8, group_2)	38,18
test-ubuntu (engine dask, python 3.8, group_1)	34,55
test-ubuntu (engine ray, python 3.8, group_1)	33,78
Test HDK storage format, Python 3.8	33,35
test-ubuntu (engine dask, python 3.8, group_2)	28,3
test-windows (engine ray, python 3.8, group_2)	23,68
test-ubuntu (engine python, python 3.8, group_1)	23,42
Test BaseOnPython execution, Python 3.8	18,48
test-ubuntu (engine ray, python 3.8, group_3)	16,95
test-ubuntu (engine ray, python 3.8, group_2)	11,28
test experimental	9,55
test-internals	7,5
lint (mypy)	5,5
test-spreadsheet (engine ray, python 3.8)	5,03
test cloud	4,65
test (pyarrow, python 3.8)	4,62
test-no-engine	4,07
test api	3,98
test-headers	3,95
test-spreadsheet (engine dask, python 3.8)	3,95
lint (pydocstyle)	2,73
test-clean-install-windows	2,53
test-clean-install-ubuntu	1,28
lint (black)	0,37
lint (flake8)	0,35
upload-coverage	0,32

My proposal would be to differentiate what we run on PRs vs what we run on commits to master and release/* branches (right now we run the same set of tests plus some extra tests on master like testing ray-client).

To sum up, I propose the following changes to CI runs for pull requests:

Drop the most expensive tests like groupby, sort and join on Ray, Dask and Unidist on Linux
- Run those anyway if special files concerning Ray/Dask/Unidist were touched (modin/execution/ray etc)
Drop almost all the tests except probably the "super-sanity" set (dataframe/test_map_metadata.py, test_series.py, test_io.py) on Ray and Dask on Windows
Add running the whole test suite on PandasOnPython backend on Windows
Think more what we run regarding ASV - whether we really care about that in PRs, or maybe we can drop this from PRs and instead run this on merges to master comparing current and previous commit or something similar? cc @anmyachev

Note: I'm not proposing to apply this to merges to master or release branch! That CI should stay the same, testing the whole suite.

Expected effects:

Advantages: the run time of the CI on pull requests should be much shorter, maybe we would be able to join back all the "groups" we introduced a while back. cc @dchigarev
Disadvantages: this slightly increases risk of merging something that would break in the cases we stopped testing (like breaking in groupby on Ray), and we would only know when master CI fails. This also means someone would have to watch on CI failures on master closer than we're watching now.
Neutral: it seems that code coverage measurements in PRs would become completely moot, so maybe we can consider not gathering them and not failing on a certain threshold.

Jun 08 '23 08:06 vnlitvinov

Another note: this PR https://github.com/modin-project/modin/pull/6267 seems to be a good example were testing only PandasOnPython is not enough - we should try to keep at least some tests which are bumping into Ray peculiarities. Seems that test_map_metadata.py is a good candidate.

Jun 15 '23 08:06 vnlitvinov

Think more what we run regarding ASV - whether we really care about that in PRs, or maybe we can drop this from PRs and instead run this on merges to master comparing current and previous commit or something similar? cc @anmyachev

ASV tests are only run when there are changes in asv_bench folder, which is quite rare.

Add running the whole test suite on PandasOnPython backend on Windows

@vnlitvinov do you mean BaseOnPython execution? Or are we just not testing PandasOnPython at all right now, so I don’t see it?

Drop almost all the tests except probably the "super-sanity" set (dataframe/test_map_metadata.py, test_series.py, test_io.py) on Ray and Dask on Windows

I guess it's ok to start with that and besides, it shouldn't affect coverage since we don't have windows specific code.

Jun 15 '23 10:06 anmyachev

@vnlitvinov do you mean BaseOnPython execution? Or are we just not testing PandasOnPython at all right now, so I don’t see it?

We aren't testing PandasOnPython on Windows at all, and BaseOnPython is a different thing, it cannot substitute testing for Ray or Dask because it uses different query compiler (and it's using a partitioning mechanism).

ASV tests are only run when there are changes in asv_bench folder, which is quite rare.

This isn't completely true, it still installs everything before it sees there were no changes and quits, which takes ~2 minutes. But yeah, it's a minor thing, I agree. Thanks for pointing it to me!

Jun 19 '23 09:06 vnlitvinov

modin modin copied to clipboard

Speed up tests

modin
modin copied to clipboard