argopy icon indicating copy to clipboard operation
argopy copied to clipboard

CI tests way too long on WIndows !

Open gmaze opened this issue 1 year ago • 12 comments

We need to understand why CI tests are so much longer with Windows OS than with the other Linux and Mac.

They can take longer than the runner cancelling policy of 45mins, eg:

https://github.com/euroargodev/argopy/actions/runs/11027230679/job/30627964822

which make the CI tests suite to fail for the wrong reasons

Any idea of how to improve this or to understand why this is the case ?

gmaze avatar Sep 25 '24 08:09 gmaze

and yes, fail-fast is set to false on the workflow

gmaze avatar Sep 25 '24 08:09 gmaze

stupid me: timeout-minutes: 45 !

gmaze avatar Sep 25 '24 08:09 gmaze

why did we set such a time out ?

gmaze avatar Sep 25 '24 08:09 gmaze

well, that does not explain why CI tests remain much slower on Windows

gmaze avatar Sep 25 '24 08:09 gmaze

I did look at your setup but Windows have some aggressive file locking that can make some IO operations quite slow. Sometimes creating the environment is the bottleneck, and if that is the case here, I wonder if pixi could help. I can try a PR n xt week if you are interested in experimenting with it.

ocefpaf avatar Sep 25 '24 09:09 ocefpaf

thanks @ocefpaf ! let me check first if the env is the bottleneck

gmaze avatar Sep 25 '24 09:09 gmaze

building/restoring the env with micromamba takes about 5mins in Windows, while is takes 1mins (build) or 15 seconds (restore from cache) under Linux and Mac OS, even if the env is in cache and about the same size (600 vs 500 mb)

as an indication, the tests suite takes about:

  • 30mins to run under Linux
  • 25mins under Mac Os
  • 50mins under Windows

so even if we install the env in about seconds, running the suite under Windows remains slower by about 15 to 20 mins

gmaze avatar Sep 25 '24 09:09 gmaze

This issue was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] avatar Jan 08 '25 10:01 github-actions[bot]

this is still a relevant issue

also, despite #551 , new BGC features (#535 , #542 ) increase significantly execution time of the full test suite

gmaze avatar Nov 13 '25 10:11 gmaze

and I also noticed that the CI tests with s3store fail regularly because the AWS authorisation token has expired by the time the relevant CI tests are reached !

eg:https://github.com/euroargodev/argopy/actions/runs/19314570118/job/55242916947?pr=551#step:11:2872

ERROR argopy\tests\test_stores_index.py::Test_IndexStore_pyarrow_BGC_bio::test_hosts[s3] - PermissionError: The provided token has expired.
[...]
E           botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the GetObject operation: The provided token has expired.

which I don't understand ! in the CI yml file we use:

role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/ga-ci-tests-argopy-01

and I increased this role ga-ci-tests-argopy-01 session duration to 4 hours, which should be largely enough to cover the 1-hour CI test session

or is this an aiobotocore issue ?

gmaze avatar Nov 13 '25 10:11 gmaze

eg of top 10 slowest tests under windows (https://github.com/euroargodev/argopy/actions/runs/19314570118/job/55242916947?pr=551):

============================ slowest 10 durations =============================
74.37s call     test_fetchers_data_erddap_bgc.py::Test_Backend::test_fetching[ds='bgc', mode='expert', params='all', {'region': [-55, -47, 55, 57, 0, 10]}]
73.79s call     test_fetchers_data_erddap_bgc.py::Test_Backend::test_fetching_cached[ds='bgc', mode='expert', params='all', {'region': [-55, -47, 55, 57, 0, 10]}]
52.48s call     test_fetchers_data_erddap_bgc.py::Test_Backend::test_fetching_parallel_thread[ds='bgc', mode='expert', params='all', {'region': [-55, -47, 55, 57, 0, 10, '2022-05-1', '2023-07-01']}]
46.78s call     test_fetchers_data_erddap_bgc.py::Test_Backend::test_fetching_cached[ds='bgc', mode='expert', params='all', {'float': [5903248, 6904241]}]
42.25s call     test_fetchers_data_erddap_bgc.py::Test_Backend::test_fetching_parallel_thread[ds='bgc', mode='expert', params='all', {'float': [5903248, 6904241]}]
41.97s call     test_fetchers_data_erddap_bgc.py::Test_Backend::test_fetching[ds='bgc', mode='expert', params='all', {'float': [5903248, 6904241]}]
35.36s setup    test_extensions_carbonate_content.py::test_compute_weighted_mean_outputs_and_uncertainties
31.20s call     test_fetchers_facade_index.py::Test_Facade::test_plot_qc_altimetry
30.75s setup    test_stores_index.py::Test_IndexStore_pandas_CORE::test_a_search[s3, date[n=6], nrows=None]
29.16s setup    test_stores_index.py::Test_IndexStore_pandas_CORE::test_a_search[http_mocked, wmo[n=1], nrows=None]

gmaze avatar Nov 13 '25 10:11 gmaze

or is this an aiobotocore issue ?

I just found that the https://github.com/marketplace/actions/configure-aws-credentials-action-for-github-actions has a default role-duration-seconds of 1 hour ! let's try to increase this to 2hours

gmaze avatar Nov 13 '25 15:11 gmaze