BioSimSpaceTutorials icon indicating copy to clipboard operation
BioSimSpaceTutorials copied to clipboard

This repository is massive

Open lohedges opened this issue 3 years ago • 5 comments

Are all of the files in this repository strictly necessary for the purposes of the tutorials. I've just done a fresh clone and see the following...

The compressed repository size:

git bundle create tmp.bundle --all 
du -sh tmp.bundle 
522M    tmp.bundle

The unpacked size:

du -h --max-depth=1 
2.0G    ./04_fep
3.8M    ./01_introduction
523M    ./.git
27M     ./03_steered_md
372K    ./LIVECOMS
141M    ./02_funnel_metad
3.2G    .

Files for the FEP section are 2GB in size! Do we really need absolutely everything here? If this repository is intended to be persistent for the purposes of a living journal then it will quickly become unwieldy if we make edits or add additional files. This will also cause issues for the purposes of hosting these tutorials within a minimal Docker image for hosting on a cloud service.

lohedges avatar Nov 08 '22 11:11 lohedges

No probably not, but currently I don't know which bits are essential and which bits aren't. I am ok with squashing history once we are done with a version that is ok? I just migrated what was there for now with no regard for need. I can go through the FEP tutorial but maybe @annamherz can do an initial trim?

ppxasjsm avatar Nov 08 '22 16:11 ppxasjsm

We could migrate beefy input files/trajectories to a URL and update the notebooks to wget the files when needed. That should trim things down considerably. This would help as in the future I anticipate updates to the tutorial suite when we push new major/minor releases of BioSimSpace. We can sort this out when all the content has been reviewed/approved.

jmichel80 avatar Nov 08 '22 16:11 jmichel80

I'll have a look through the files. I think we can get rid of all the prep files actually, they were mainly to show the outcomes of a ligprep script, and not really necessary for running the tutorials themselves.

annamherz avatar Nov 08 '22 16:11 annamherz

I think a lot of this is due to the ABFE example output energies. These are already compressed, but are far bigger than needed as energies haven't been subsampled according to statistical inefficiency. I'll cut these down later.

fjclark avatar Nov 08 '22 16:11 fjclark

Thanks, all. As @ppxasjsm says, we can decide what to do (pruning old files, squashing, offloading input/output to some webhosting) once all of the text / notebooks are ready. It would just be good to have a think about what files are really necessary while doing so.

Cheers.

lohedges avatar Nov 08 '22 16:11 lohedges