iranlowo
iranlowo copied to clipboard
Improve BIG file dependencies
We need to fix the big file dependencies in this project:
- The pre-trained ADR model (binary) is a 88MB file living in the model folder. This make a very heavy upload/download from PYPI.
- The
torch
dependency inrequirements.txt
by default pulls down the GPU version of torch. This makes integration with Heroku and RTD difficult/impossible because of hard-size-limits. It would be better to integrate and use a CPU-only version. Is this compatible with Travis CI andrequirements.txt
??
To facilitate all this:
- all the ADR pre-trained models live in this Bintray artifactory
- Is some clever way (or post install script) that we can download them locally as needed
- The upside is that the iranlowo download is fast/small and then if you can separately pull down the models to do inference/prediction.
A possible workaround here would be to have a standalone repository for models. So, if a user needs any functionality tied to a model, a check comes up to see if they've cloned / downloaded the model. If not, an error is raised. This is how I've seen a lot of projects handle this challenge. On travis end, we can have it clone that same repository each time a test needs to be run. Major challenge here is having the user do multiple installs.
I'm not very familiar with torch as I use keras more but why haven't we considered zipping the file yet? Is that going to reduce performance somehow? If not, it'll solve the challenge of having to do multiple installs.
Let me tackle matters in order:
- Regarding "standalone repository for models", I've been saving the models here because pre-optimization (April 2019 time frame), the models were 200MB and too big for github. I listed the link in the top post above ☝️
all the ADR pre-trained models live in this Bintray artifactory
-
Regarding zipping the file, I optimized the size of the pytorch model. See this issue. It basically removed the "intermediate back-propagation information" necessary to continuing to train from particular model checkpoint. I don't think additional optimization will gain much, but that is another experiment to see what the exact compression factor is.
-
Finally, back in April when I was trying to get things started and the 200MB model wasn't going to go onto github and I was using the Bintray artifcatory, I thought perhaps I could use pre-install step to
setup.py
and I asked this question on the repo of the setupmeta project used to easysetup
. And the answer is yes, you can use a pre/post-install step to programmatically download from the artifcatory, so that is the path I think we need to explore, I can tackle this next week, it think it'll take some experimentation (trail & error) to ensure that things work smoothly.This the StackOverflow thread with more details/instructions to implement