micropip icon indicating copy to clipboard operation
micropip copied to clipboard

Remove binary files from repository

Open agriyakhetarpal opened this issue 1 year ago • 3 comments

Similar to @Carreau's suggestion in #139: we should remove the existing wheel files from tests/test_data/wheel/, since they will only bloat the repository as we proceed to add more commits to the default branch. They can be stored elsewhere

Yes, maybe let's discuss it in a separate issue. I don't like to put bunch of big files in the repository for testing, but I am also not sure whether putting it in a separate place is a good way to go.

Originally posted by @ryanking13 in https://github.com/pyodide/micropip/issues/139#issuecomment-2537775135


  • My idea is to use a dummy repository and add the wheels to the GitHub release, and then download them at the time of running the tests using https://github.com/fatiando/pooch (which can cache the files as well). It is used by the scikit-image and scikit-learn test suites to download data files (from SciPy's datasets, in the case for the latter), and I use it as well, for PyBaMM. However, it will store them in a separate place indeed, which might not be what we want.

  • Another approach we could take to avoid storing them elsewhere is to remove them in a commit, and use the raw GitHub permalink for the files before said commit to keep accessing them and download them. This gives us the added benefit of the fact that GitHub also sets CORS headers on such URLs. However, we won't be able to update the files as easily with this method.

agriyakhetarpal avatar Dec 12 '24 09:12 agriyakhetarpal

I think 2 of these wheels are already on PyPI, so we don't need to re-store we can "just" store a hash and redownload them checking the hash.

test_wheel_uninstall-1.0.0-py3-none-any.whl is small, and I think the problem I was pointing was that it's not auditable. I think It would be ok to store it in deflated form and zip then rename during the tests.

Carreau avatar Dec 13 '24 09:12 Carreau

Thanks for the suggestion! pooch looks interesting and I like the point that it can cache files. One thing that I am worried about is that if we remove the remote files for some reason (or if the URL changes for some reason), the test will break, and users will not be able to handle it easily.

But we are already quite relying on the GitHub infra (storing xbuildenv and the metadata), so everything will break if there is an issue in GitHub anyway... so would be fine to utilize GitHub to store the binary files.

My idea is to use a dummy repository and add the wheels to the GitHub release, and then download them at the time of running the tests using https://github.com/fatiando/pooch (which can cache the files as well).

I think 2 of these wheels are already on PyPI, so we don't need to re-store we can "just" store a hash and redownload them checking the hash.

Yeah, I think we can start with downloading them from PyPI, test_wheel_uninstall-1.0.0-py3-none-any.whl can be replaced with any other package with some complex file structure, so I think we can replace it with some real package in PyPI.

ryanking13 avatar Dec 13 '24 12:12 ryanking13

Yes, pooch can be overkill if we don't have a lot of test data and if the files are small. We can also archive the dummy repository so that no one except an administrator will be able to remove the remote files from the release.

agriyakhetarpal avatar Dec 13 '24 18:12 agriyakhetarpal