gplately icon indicating copy to clipboard operation
gplately copied to clipboard

Oversized github repository

Open michaelchin opened this issue 2 years ago • 5 comments

The size of this repository is 914M as of 13 Jun 2023. Write down this problem lest I forgot.

du -sh -- * .[^.]* | sort -h

Screenshot 2023-06-13 at 6 05 30 pm

Ignore history:

git clone --depth 1 --branch master https://github.com/GPlates/gplately.git

michaelchin avatar Jun 13 '23 07:06 michaelchin

All of this is taken up by graphics and animations in the notebooks. I would like to clear these from all notebooks in the repo and flush them from the git history as well.

I've looked at a few solutions in the past but none were fantastic. Running jupyter nbconvert in a git hook might be one way. Alternatively we could use jupytext to properly version control notebooks as markdown files (or regular python files).

brmather avatar Jun 19 '23 16:06 brmather

Just a note to myself.

use BFG to purge history https://rtyley.github.io/bfg-repo-cleaner/#download

michaelchin avatar Jun 14 '24 01:06 michaelchin

Put this on ice until it causes real problems or too painful to bear.

michaelchin avatar Jun 24 '24 01:06 michaelchin

use BFG to purge history https://rtyley.github.io/bfg-repo-cleaner/#download

Most of the space is taken up by Jupyter notebook outputs (images, gifs, etc.) Does BFG help with this?

I've found a couple more resources which may help with Jupyter notebooks:

https://zhauniarovich.com/post/2020/2020-10-clearing-jupyter-output-p3/ https://www.scivision.dev/git-jupyter-strip-output/

brmather avatar Jul 11 '24 02:07 brmather

use BFG to purge history https://rtyley.github.io/bfg-repo-cleaner/#download

Most of the space is taken up by Jupyter notebook outputs (images, gifs, etc.) Does BFG help with this?

I've found a couple more resources which may help with Jupyter notebooks:

https://zhauniarovich.com/post/2020/2020-10-clearing-jupyter-output-p3/ https://www.scivision.dev/git-jupyter-strip-output/

BFG just purges history. The history may take up some space. I think we can try your findings first. This issue is not too painful for now.

Anyway, people can always ignore history by

git clone --depth 1 --branch master https://github.com/GPlates/gplately.git

See the screenshot below. The history is not a big problem anymore.

du -sh -- * .[^.]* | sort -h

Screenshot 2024-07-11 at 1 54 10 PM

michaelchin avatar Jul 11 '24 03:07 michaelchin

convert to discussion until we decide some actionable plans.

michaelchin avatar Oct 12 '24 05:10 michaelchin