huggingface_hub icon indicating copy to clipboard operation
huggingface_hub copied to clipboard

Speed-up tests using `pytest-xdist`

Open Wauplin opened this issue 1 year ago • 5 comments

Github CI for huggingface_hub is doing a lot of good things but I feel tests speed could be improved to get a shorter feedback loop on PRs. An idea to speed-up the tests without changing anything to the logic is to run them in parallel. This is something that pytest-xdist does quite well (see docs). In short, it spawns workers in separate processes and distribute the tests on them. Since it's just a pytest plugin to enable, it makes me think it can be a quick win almost without changing any code/workflow. And it is always possible not to use it for local tests if it doesn't make sense.

Of course, there are a few cons that I see:

  1. In theory all tests should be independent from each other but it might not be exactly the case. I am especially thinking about Repository tests that use a local folder. We would need to check this precisely and if necessary make some arrangement in the tmp folder names. Same goes for temporary repo ids.
  2. There is a risk that the staging/prod hub receives too much load at the same time. Do you think that could be of a problem ? (ping @julien-c @Pierrci @coyotte508)
  3. There is a risk the machine network limits the tests anyway. This is to be tested but in general I'm confident that we can have a speed-up improvement in the CI.

Is this something that have already been discussed before ? What are your first thoughts ? If interested, I can have a deeper look at it.


FYI, I already made a quick benchmark locally using 8 processes for the "Python 3.10 (everything else)" workflow. I did not investigate any of failures but test time has been cut by 5 (3 min instead of 15 min).

HUGGINGFACE_CO_STAGING=1 pytest ./tests/ -k 'not RepositoryTest'

====================================================================================== short test summary info ======================================================================================
FAILED tests/test_repository.py::RepositoryOfflineTest::test_correct_helper - AssertionError: Lists differ: ['osxkeychain', 'get', 'store'] != ['get', 'store']
========================================================== 1 failed, 184 passed, 55 skipped, 38 deselected, 1 warning in 896.55s (0:14:56) ==========================================================

HUGGINGFACE_CO_STAGING=1 pytest ./tests/ -k 'not RepositoryTest' -n 8

============================================================================================== short test summary info ===============================================================================================
FAILED tests/test_repository.py::RepositoryOfflineTest::test_auto_track_binary_files - requests.exceptions.HTTPError: Invalid user token. If you didn't pass a user token, make sure you are properly logged in by ...
FAILED tests/test_repository.py::RepositoryOfflineTest::test_auto_track_binary_files_ignored_with_gitignore - requests.exceptions.HTTPError: Invalid user token. If you didn't pass a user token, make sure you are...
FAILED tests/test_repository.py::RepositoryOfflineTest::test_auto_track_large_files_through_git_add - AssertionError: False is not true
FAILED tests/test_repository.py::RepositoryOfflineTest::test_correct_helper - AssertionError: Lists differ: ['osxkeychain', 'get', 'store'] != ['get', 'store']
FAILED tests/test_repository.py::RepositoryOfflineTest::test_repo_init_checkout_revision - OSError: fatal: unable to read tree ab63aa4a39036f4794bd22e594fec967b94b8cca
FAILED tests/test_repository.py::RepositoryOfflineTest::test_is_tracked_with_lfs_with_pattern - FileNotFoundError: [Errno 2] No such file or directory: '/Users/lucain/projects/huggingface/huggingface_hub/tests/f...
FAILED tests/test_repository.py::RepositoryOfflineTest::test_repo_user - OSError: Looks like you do not have git installed, please install.
FAILED tests/test_repository.py::RepositoryOfflineTest::test_no_branch_checked_out_raises - ValueError: If not specifying `clone_from`, you need to pass Repository a valid git clone.
ERROR tests/test_repository.py::RepositoryOfflineTest::test_is_tracked_with_lfs_with_pattern - subprocess.CalledProcessError: Command '['git', 'tag', '-l']' returned non-zero exit status 128.
ERROR tests/test_repository.py::RepositoryOfflineTest::test_repo_user - FileNotFoundError: [Errno 2] No such file or directory: '/Users/lucain/projects/huggingface/huggingface_hub/tests/fixtures/working_repo_2'
ERROR tests/test_repository.py::RepositoryOfflineTest::test_no_branch_checked_out_raises - subprocess.CalledProcessError: Command '['git', 'tag', '-d', b'v0.0.1\nv0.0.10\nv0.0.11\nv0.0.12\nv0.0.13\nv0.0.14\nv0.0...
==================================================================== 8 failed, 177 passed, 55 skipped, 3 warnings, 3 errors in 170.12s (0:02:50) =====================================================================

Wauplin avatar Aug 09 '22 10:08 Wauplin

There is a risk that the staging/prod hub receives too much load at the same time. Do you think that could be of a problem ? (ping @julien-c @Pierrci @coyotte508)

No I don't think this is a problem, the tests hit staging, and I don't think it will cause load problems if it's just 8 parallel tests.

Btw we also added parallel tests on moonlanding (3x), so I don't think anyone is against the idea (at least from moonlanding)

One possible issue, is if some tests log in / out a user, or create a repo and then expect a certain number of repos, it could mess up w/ parallel tests (logging out a user logs out all sessions of the user). Or if parallel tests use the same repo name.

But given that we probably have tests already running in parallel when multiple CI run at the same time, maybe there's no problems? Not familiar at all with the hub tests, talking from a "moonlanding" point of view.

coyotte508 avatar Aug 09 '22 16:08 coyotte508

Note that if i remember correctly it's tests/test_repository.py in particular that's slow, and I think some tests there are not really necessary and could be pruned (testing several times the same thing, with multiple repo types (i think))

Just wanted to note this even though it's orthogonal to this issue's subject

julien-c avatar Aug 09 '22 18:08 julien-c

(we do use pytest-xdist in transformers if i remember correctly)

julien-c avatar Aug 09 '22 18:08 julien-c

All for using pytest-xdist if the server can handle it. As said above we'll likely have to do a few changes regarding repository management and user sessions, but it should be pretty lightweight.

(we do use pytest-xdist in transformers if i remember correctly)

We do, with -n 2 I believe.

LysandreJik avatar Aug 10 '22 07:08 LysandreJik

Great ! Thanks for giving more context. I'll address it when I have time and make sure to look at the potential issues you mentioned @coyotte508 and @LysandreJik .

@julien-c I keep it in mind for the repository tests but I'll wait at least until I have worked a bit on the repository module before refactoring its tests 😄

Wauplin avatar Aug 10 '22 15:08 Wauplin