tiatoolbox icon indicating copy to clipboard operation
tiatoolbox copied to clipboard

🐌 Slow tests

Open blaginin opened this issue 2 years ago • 4 comments

  • TIA Toolbox version: 1.4.0
  • Python version: 3.10.8
  • Operating System: Mac OS 13.3.1 (arm)

Description

Tests in the toolbox are extremely slow. Even with CI=True, all the checks take almost half an hour. Non-CI mode takes even more time. This makes an interactive development approach, with running tests after small changes, impossible. This results in writing bad test-related code.

The main reasons why tests are slow are static files. Each test run is related to downloading and then deleting gigabytes of models' weights and test images. However, almost always, those files remain unchanged after the tests' execution.

Not only is this slow, but it also triggers a lot of false positive results. For example, that's the typical result of my local run:

image

Todo

  • [x] Fixtures caching
  • [x] Models caching
  • [x] Cache in CI
  • [x] Parallel tests
  • [ ] Safe downloads
  • [ ] Cache rechecks (HTTP headers / ETag)

blaginin avatar May 08 '23 16:05 blaginin

Of course, we want tests' reproducibility and don't want our code to fail because some code before has modified our assets. But maybe we can consider more user-friendly approaches? I have those options in mind:

  • Compare a file's local last change time with the remote one and download only if it's greater. This is done by If-Range / Last-Modified headers, which are already supported by tiatoolbox.dcs.warwick.ac.uk server.
  • Store files checksum on the server side and download a file only if its checksum is different
  • Copy files from the local cache folder into the tests' sandbox each time tests are run

blaginin avatar May 08 '23 16:05 blaginin

@John-P @shaneahmed, maybe you have other ideas?

blaginin avatar May 08 '23 16:05 blaginin

It looks like there are a few issues to address here:

  • The remote samples fixture is set to be scoped for the session, but it appears to be using a separate tmp path per test.
  • Large files could avoid being downloaded between runs if the HTTP modified header (assuming the server is sending it correctly) is < date modified for the local copy.
  • Downloads can be verified with a checksum which is not currently done.

John-P avatar May 12 '23 09:05 John-P

@blaginin As John mentioned downloads should be for a session. Please let us know if this is not the case. Feel free to create a PR to fix this.

shaneahmed avatar May 12 '23 11:05 shaneahmed