onnx icon indicating copy to clipboard operation
onnx copied to clipboard

Remove test data from PyPI package

Open cbourjau opened this issue 1 year ago • 8 comments

Describe the bug

The ONNX package on PyPI contains all test files found at https://github.com/onnx/onnx/tree/main/onnx/backend/test/data . These constitute ~40MB unpacked or more than 70% of the total package size.

System information

I checked the 1.15.0 MacOS wheel, but judging by the compressed file size all platforms are affected: https://pypi.org/project/onnx/#files

Expected behavior

These test files should not be installed in a production environment.

Other notes

Are there any plans to move those binary files out of git / generate them on the fly?

cbourjau avatar Feb 09 '24 14:02 cbourjau

+1 on this. Otherwise ONNX PyPI package will grow significantly when there are more backend tests from more ops. See related discussion in this issue. For now, some users might still need static backend tests from the package and in that case onnx can have 2 packages -- one with test data and one without test data, but personally I feel ONNX eventually should stop providing them from PyPI and just let users produce them on the fly.

jcwchen avatar Feb 09 '24 17:02 jcwchen

We should.

justinchuby avatar Feb 09 '24 21:02 justinchuby

One advantage with distributing the test data of course, is that runtimes do not need the Python tool chain to run tests (protobuf python, numpy etc.)

justinchuby avatar Feb 10 '24 06:02 justinchuby

When you say "distributing" do you mean shipping them in the PyPI package or having them in the repository? I have a hard time seeing a use case where a downstream project would rather fish the test files out of the PyPI package than using a git submodule.

cbourjau avatar Feb 12 '24 11:02 cbourjau

Ah you are right. We can check those in without distributing them with the Python package.

justinchuby avatar Feb 13 '24 05:02 justinchuby

Ah you are right. We can check those in without distributing them with the Python package.

justinchuby avatar Feb 13 '24 05:02 justinchuby

I think the best way forward is if we were to move the onnx/onnx/backend/test folder out of the Python package. While being at it, we may want to do the same with onnx/onnx/backend/sample.

cbourjau avatar Feb 13 '24 10:02 cbourjau

I took a closer look at this issue. Unfortunately, the lines between tests that should not be packaged, test utilities, and the reference implementation are blurry. Moving the "tests" out of onnx/ is quite a large and technically breaking change. The most minimally invasive way to exclude those test files is by simply excluding them from the final package as done in #5970 .

cbourjau avatar Feb 27 '24 22:02 cbourjau