distributed-learning-contributivity icon indicating copy to clipboard operation
distributed-learning-contributivity copied to clipboard

The repo is way too heavy!

Open natct10 opened this issue 5 years ago • 8 comments
trafficstars

The repo is now 1.1GB which is not okay. It is likely that a dataset has been added somewhere. I will investigate this issue, but any help is welcome for this matter. See for yourself (size key): curl https://api.github.com/repos/SubstraFoundation/distributed-learning-contributivity

natct10 avatar Oct 26 '20 09:10 natct10

You can even use: curl https://api.github.com/repos/SubstraFoundation/distributed-learning-contributivity 2> /dev/null | grep size | tr -dc '[:digit:]'

natct10 avatar Oct 26 '20 09:10 natct10

I deleted PVRL, Moving-functions, Add-Imdb-dataset[...] and dvrl. All these branches had been either dropped or rebased in an other branch, which has been merged

arthurPignet avatar Oct 26 '20 12:10 arthurPignet

Great, thank you @arthurPignet! But the repo size seems to remain unchanged :/

natct10 avatar Oct 26 '20 15:10 natct10

Hello! I investigated a little bit this problem and found this. It seems to come from the .git/objects/pack/ folder. Here you can find an explanation about what it is.

With this command line, we can see that there are some heavy files. git verify-pack -v .git/objects/pack/pack-*.pack | grep -v chain | sort -k3nr | head

So I try to identify in the files in question which are so heavy. I run this command : git rev-list --objects --all | grep "$(git verify-pack -v .git/objects/pack/*.pack | sort -k 3 -n | tail -10 | awk '{print$1}')"\

Here the results: image

So it seems that we saved models in folders which were not ignored. I hope that helps :)

celinejacques avatar Nov 02 '20 14:11 celinejacques

By the way, we really should separate code from its outputs (reports), which could be hosted on this open science oriented platform https://osf.io/. Besides, this would be totally relevant with a publication project (doi for assets, etc.)!

natct10 avatar Nov 05 '20 15:11 natct10

So, the target is: patience_sept_2020-09-07_17h37 from catastrophic forgetting, dossier resultats commit. Thank you @celinejacques for the check!

natct10 avatar Nov 05 '20 15:11 natct10

Great to see that the target had been found ! @natct10 did you have the time to remove the commit ? Can we close this issue ?

arthurPignet avatar Dec 11 '20 11:12 arthurPignet

By the way, we really should separate code from its outputs (reports), which could be hosted on this open science oriented platform https://osf.io/. Besides, this would be totally relevant with a publication project (doi for assets, etc.)!

I suggest to open a new issue to discuss about that, I think it's a good idea

arthurPignet avatar Dec 11 '20 11:12 arthurPignet