oshdb
oshdb copied to clipboard
Removing old test-files from git-history
we have same very old and largely test-data files in our git history which lets our repository grow to the current size of 120mb. files in the history like the following could by wiped from the history to reduce our repository size:
- 46b7725b3c4a 1,5MiB core/oshdb/src/test/resources/data/hosmdb_keytables.mv.db
- 74bc041433cc 1,5MiB oshpbf-parser/src/test/resources/org/heigit/bigspatialdata/oshpbf/mapreduce/maldives.osh.pbf
- 2136edb4ea5b 1,6MiB test-data/equatorial-guinea.osh.pbf
- 700dd9c55ccc 2,0MiB core/oshdb-tool/src/test/resources/maldives.osh.pbf
- 92d488930d05 2,0MiB test-data/faroe-islands.osh.pbf
- 8cbb0ad1ec34 3,0MiB test-data/andorra.osh.pbf
- c41cd0fa27d3 5,5MiB oshdb-api/src/test/resources/update-test-data.mv.db
- c997c5c33936 5,8MiB oshdb-api/src/test/resources/test-update-data.mv.db
- 6683c395170b 6,0MiB oshdb-api/src/test/resources/test-update-data.mv.db
- 6ec7f46aadf7 8,5MiB oshdb-util/src/main/resources/ne_10m_admin_0_map_units/ne_10m_admin_0_map_units.shp
- 3c699dd29a85 28MiB test-data/kathmandu.osh.pbf
- 2f67d705dbe9 78MiB core/oshdb/src/test/resources/data/hosmdb_way.mv.db
I used this command from stackoverflow [1] to find those files
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
A good tool for wiping files from the git history could be
- https://rtyley.github.io/bfg-repo-cleaner/
What do you think about this?
[1] https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history
To remove them from master
you have to rewrite the whole master
history. For a public repo (with releases and forks) this is something which is strongly discouraged, even tough I would like to remove them. I'm pretty torn.
I believe the following 3 could be removed without (big) history-rewriting troubles, since they were not (yet) merged into master:
c41cd0fa27d3 5,5MiB oshdb-api/src/test/resources/update-test-data.mv.db c997c5c33936 5,8MiB oshdb-api/src/test/resources/test-update-data.mv.db 6683c395170b 6,0MiB oshdb-api/src/test/resources/test-update-data.mv.db
For the rest… I don't know. The 100MB+ repo size is not great, but rewriting history of the whole project (incl. all branches) is also quite troublesome.
We could just recommend people to create shallow clones when disk usage or slow connections are an issue (e.g. git clone --depth=1 https://github.com/GIScience/oshdb
)?
$ git clone --depth=1 https://github.com/GIScience/oshdb
…
$ du -hs oshdb
7.9M oshdb