Reducing the size of pypi package
-
v2 is not released on pypi
-
The current pypi package is about 30MB in size. Is it really necessary?
For comparison, the scipy source tar ball is 10 MB in size, and scipy has a lot more functionality than CorrFunc.
Most of the 30 MB belongs to the testing data in Theory directory.
Thanks for the reminder - I find the repo size too large as well and I tried in issue #51 to fix that. Unfortunately, that only reduced the repo somewhat and didn't quite get it down to acceptable levels.
Looking at the data files again, I see that the random_Zspace.ff could potentially be removed and then regenerated during the tests with a fixed seed and a specific random number generator method. However, doing so does require changing/verifying the known correct test results.
Do-able but currently I would rather wait until the paper is submitted; at which point I will update the PyPI repo as well (see existing issue #82).
Thanks again for the feedback. Do you mind if I assign v2.1 as the target milestone (rather than v2.0)?
Most of the "weight" columns in the datasets are random too, and could even be generated in-memory and never written to disk. However, I decided not to implement that partially because it makes comparison with other codes more difficult. Whatever space-saving changes we decide to implement, we should at least keep a write-to-disk mode.
On Tue, Dec 6, 2016 at 8:27 PM, Manodeep Sinha [email protected] wrote:
Thanks for the reminder - I find the repo size too large as well and I tried in issue #51 https://github.com/manodeep/Corrfunc/issues/51 to fix that. Unfortunately, that only reduced the repo somewhat and didn't quite get it down to acceptable levels.
Looking at the data files again, I see that the random_Zspace.ff could potentially be removed and then regenerated during the tests with a fixed seed and a specific random number generator method. However, doing so does require changing/verifying the known correct test results.
Do-able but currently I would rather wait until the paper is submitted; at which point I will update the PyPI repo as well (see existing issue #82 https://github.com/manodeep/Corrfunc/issues/82).
Thanks again for the feedback. Do you mind if I assign v2.1 as the target milestone (rather than v2.0)?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/102#issuecomment-265328144, or mute the thread https://github.com/notifications/unsubscribe-auth/AFYPy_1YPWghPWRx_bPoRQja8O-YQVcmks5rFguXgaJpZM4LGFQ2 .
@lgarrison Good point!
One solution could be to have a submodule with the test data and the users can chose to not run tests and never download the datasets.
Yes that's a good approach. But you'll need to find a place to store the test data.
I was simply thinking of opening up a different github repo (under https://github.com/clusterfunc/ organization -- which is where some future version of Corrfunc is going to be). Do you have a different suggestion?
Good idea. That's what Nick has set up for nbodykit. I used to use amazon S3 -- less convenient indeed.
On Wed, Dec 7, 2016 at 10:17 PM, Manodeep Sinha [email protected] wrote:
I was simply thinking of opening up a different github repo (under https://github.com/clusterfunc/ organization -- which is where some future version of Corrfunc is going to be). Do you have a different suggestion?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/102#issuecomment-265662261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTEVZeFucL4-6Z1n1sJoTAYebxkxzks5rF6ENgaJpZM4LGFQ2 .