Corrfunc Reducing the size of pypi package

v2 is not released on pypi
The current pypi package is about 30MB in size. Is it really necessary?

For comparison, the scipy source tar ball is 10 MB in size, and scipy has a lot more functionality than CorrFunc.

Dec 07 '16 01:12 rainwoodman

Most of the 30 MB belongs to the testing data in Theory directory.

Dec 07 '16 01:12 rainwoodman

Thanks for the reminder - I find the repo size too large as well and I tried in issue #51 to fix that. Unfortunately, that only reduced the repo somewhat and didn't quite get it down to acceptable levels.

Looking at the data files again, I see that the random_Zspace.ff could potentially be removed and then regenerated during the tests with a fixed seed and a specific random number generator method. However, doing so does require changing/verifying the known correct test results.

Do-able but currently I would rather wait until the paper is submitted; at which point I will update the PyPI repo as well (see existing issue #82).

Thanks again for the feedback. Do you mind if I assign v2.1 as the target milestone (rather than v2.0)?

Dec 07 '16 01:12 manodeep

Most of the "weight" columns in the datasets are random too, and could even be generated in-memory and never written to disk. However, I decided not to implement that partially because it makes comparison with other codes more difficult. Whatever space-saving changes we decide to implement, we should at least keep a write-to-disk mode.

On Tue, Dec 6, 2016 at 8:27 PM, Manodeep Sinha [email protected] wrote:

Thanks for the reminder - I find the repo size too large as well and I tried in issue #51 https://github.com/manodeep/Corrfunc/issues/51 to fix that. Unfortunately, that only reduced the repo somewhat and didn't quite get it down to acceptable levels.

Looking at the data files again, I see that the random_Zspace.ff could potentially be removed and then regenerated during the tests with a fixed seed and a specific random number generator method. However, doing so does require changing/verifying the known correct test results.

Do-able but currently I would rather wait until the paper is submitted; at which point I will update the PyPI repo as well (see existing issue #82 https://github.com/manodeep/Corrfunc/issues/82).

Thanks again for the feedback. Do you mind if I assign v2.1 as the target milestone (rather than v2.0)?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/102#issuecomment-265328144, or mute the thread https://github.com/notifications/unsubscribe-auth/AFYPy_1YPWghPWRx_bPoRQja8O-YQVcmks5rFguXgaJpZM4LGFQ2 .

Dec 07 '16 02:12 lgarrison

@lgarrison Good point!

One solution could be to have a submodule with the test data and the users can chose to not run tests and never download the datasets.

Dec 07 '16 02:12 manodeep

Yes that's a good approach. But you'll need to find a place to store the test data.

Dec 07 '16 19:12 rainwoodman

I was simply thinking of opening up a different github repo (under https://github.com/clusterfunc/ organization -- which is where some future version of Corrfunc is going to be). Do you have a different suggestion?

Dec 08 '16 06:12 manodeep

Good idea. That's what Nick has set up for nbodykit. I used to use amazon S3 -- less convenient indeed.

On Wed, Dec 7, 2016 at 10:17 PM, Manodeep Sinha [email protected] wrote:

I was simply thinking of opening up a different github repo (under https://github.com/clusterfunc/ organization -- which is where some future version of Corrfunc is going to be). Do you have a different suggestion?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/102#issuecomment-265662261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTEVZeFucL4-6Z1n1sJoTAYebxkxzks5rF6ENgaJpZM4LGFQ2 .

Dec 08 '16 19:12 rainwoodman