grass icon indicating copy to clipboard operation
grass copied to clipboard

pytest: Mark tests using space_time_raster_dataset as needs_solo_run

Open echoix opened this issue 1 year ago • 4 comments

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Since it causes some intermittent timeouts, this PR adds the marker to run them separately, without other tests in parallel. The setup stage is still slow, but I highly expect after my couple tries that it will prevent some more timeouts that were still present after https://github.com/OSGeo/grass/pull/3879.

On the positive side, there's (hopefully) less flaky tests to retry, that can cause delays when there is a lot of activity. On the negative side, since these tests are slow AND that the setup phase of that fixture is also slow, the overall time taken for the tests increases compared to the sum of parallel+serial test steps durations. So that means that there was some speedup of running them in parallel that we are temporarily letting go. It is still better than no parallel workers as before. And either way, some of deadlock issues, if they are caused by the multiprocessing using the fork startup method, will need to be fixed in order to run any of these tests on macOS and Windows. It's a work in progress.

echoix avatar Jun 28 '24 22:06 echoix

Also note that the "coverage" change that you might see is still flawed, as this PR will add new files "seen" by the coverage tool. It doesn't know about all the files in the repo yet, so it now sees 54 more new files.

echoix avatar Jun 28 '24 22:06 echoix

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Wild guess without checking the code: could it be a SQLite locking issue due to SQLite concurrent (read/) write access happening?

neteler avatar Jun 29 '24 14:06 neteler

The test fixture space_time_raster_dataset used in some pytest tests, including in some of the Jupyter tests, is unexpectedly slow at the setup stage when running in pytest with multiple workers. There might be a deadlock problem, or might just use many processes or threads by itself.

Wild guess without checking the code: could it be a SQLite locking issue due to SQLite concurrent (read/) write access happening?

I'm not sure, as trying to follow around what is called within the temporal modules is a nightmare, everything gets invoked/imported. Technically, it is supposed to be 7 rasters, and that fixture valid for all the tests. But something isn't quite right.

Once coverage is properly set up (maybe once C-code is also tracked), it will be a good thing to incrementally change the tests and see what changes, to simplify redundant tests (by only using the coverage provided by one test vs another).

In a separate attempt last weekend to profile some tests (gunittest), I started with only a subfolder in temporal, t.rast.algebra, and a lot of time (18%) was spent on a single inner loop of PLY that checked if an element was in a list. Other than that, one big time wasted at importing tgis, as it imported almost everything with star imports.

So the problem can be anywhere.

For now, the goal is only to prevent useless failures to have correct feedback earlier (without waiting to retry a job)

echoix avatar Jun 29 '24 14:06 echoix

I have other changes queued to these files (dating from the last day of the sprint), so I'm waiting for this to be merged first.

echoix avatar Jun 29 '24 14:06 echoix

I had to rerun about 4-5 pytest failures on main that worked on a second try. I would've hoped that this would have fixed it

echoix avatar Jul 01 '24 11:07 echoix

Now I'm trying to push for this PR to be merged, as we still have some pytest flaky timeouts that I'm convinced this will help a bit. And also was blocking me from continuing on other pytest improvements the last weekend. This weekend will be more quiet on my end though.

There's nothing left to change in this PR

echoix avatar Jul 05 '24 16:07 echoix

In curious to see if the dbif.close() changes from https://github.com/OSGeo/grass/pull/3996/files#diff-30a61f23aa90129beedbff09dd723876e2a54703c7355eaef4e45a8827eaecda would change something…

It hasn’t been there enough for it, and for the CI on my side on my PR that is before all of them, I think I still caught some failures..

echoix avatar Jul 09 '24 01:07 echoix

I agree, and for me it is not clear either how TGIS temporal database interactions are managed in total, and what is the overarching concept in dB handling, esp. in combination with TGIS python objects for example...

ninsbl avatar Jul 09 '24 01:07 ninsbl