git-lfs-migrate
git-lfs-migrate copied to clipboard
Performance degradation on huge repositories
There are performance degradation on huge repositories (>250 000 objects). Look like root cause is too big file count. Much better generate pack files per every 10 000 objects.
Hi @bozaro
I noticed that there was a significant performance degradation between release 0.2.4 and 0.2.5.
For the same repository
# with release 0.2.4:
[main] INFO git.lfs.migrate.Main - Convert time: 53224470
real 887m6.923s
user 1578m23.016s
sys 33m7.852s
# with release 0.2.5
[main] INFO git.lfs.migrate.Main - Convert time: 140884059
real 2348m5.986s
user 2231m8.880s
sys 191m58.924s
Also, while running with 0.2.5, writing of the objects seems to slow down quite a bit after a few of them are written.
While the process was outputting things like
[main] INFO git.lfs.migrate.Main - processed: 1356541/1523578
[main] INFO git.lfs.migrate.Main - processed: 1356542/1523578
I took a few file counts on the target directory:
$ find objects/ -type f | wc -l ; \
find lfs/ -type f | wc -l ; \
sleep 1 ; \
find objects/ -type f | wc -l ; \
find lfs/ -type f | wc -l
2305710
275383
2305711
275383
Regarding memory compsumption, at about the same time,
$ ps -eo vsize,rssize,cmd
25498152 17806716 java -Xmx20g -Xms20g ...
And during this stage I had only 1 CPU core being used at 100% by the conversion essentially in user time. During the whole process I noticed no significant CPU wait time; it was mostly spent on user (and a bit on sys).
The filesystem was Btrfs on a SSD disk.
How to pack files per every 10 000 objects ? I tried to convert a big repo with lfs-test-server, but after convert (about 4 days), lfs-test-server only has file name meta, no files appears inside its folder (I've tried with same setup and small repo and it's success), it's too slow to debug.
Between 0.2.4 and 0.2.5 the commit most likely to impact performance was 974270d180385acf4044b68557cbe8767ebf1ab4; which was aimed at reducing memory consumption.
Looks like it dropped a DAG library in favour of local implementation of commit graph tracking. If this is the cause perhaps there's another way of solving the memory issue, without dropping the performance gains of using a DAG library.
While trying to convert a 3.6Gb repo to LFS, I noticed a dramatic slowdown at around 1289037/1371200 objects. It might have been slowing down before that… but I see the following:
[main] INFO git.lfs.migrate.Main - processed: 10984/1371200
[main] INFO git.lfs.migrate.Main - processed: 11815/1371200
and
[main] INFO git.lfs.migrate.Main - processed: 1176360/1371200
[main] INFO git.lfs.migrate.Main - processed: 1176644/1371200
and
[main] INFO git.lfs.migrate.Main - processed: 1289268/1371200
[main] INFO git.lfs.migrate.Main - processed: 1289273/1371200
So from around 600-700 objects per second to about 5. After leaving this running for a while it seems to slow even further to about 1 a second:
[main] INFO git.lfs.migrate.Main - processed: 1307815/1371200
[main] INFO git.lfs.migrate.Main - processed: 1307816/1371200
[main] INFO git.lfs.migrate.Main - processed: 1307817/1371200
[main] INFO git.lfs.migrate.Main - processed: 1307818/1371200
Running the visualVM sampler over it, I see the following percentages:
git.lfs.migrate.Main.main() 100.0 46,804 ms (100%) 46,804 ms
git.lfs.migrate.Main.processRepository() 100.0 46,804 ms (100%) 46,804 ms
git.lfs.migrate.Main.processSingleThread() 100.0 46,804 ms (100%) 46,804 ms
git.lfs.migrate.GitConverter.convertTask() 97.000435 45,400 ms (97%) 45,400 ms
org.eclipse.jgit.revwalk.RevWalk.parseAny() 96.787125 45,300 ms (96.8%) 45,300 ms
org.eclipse.jgit.lib.ObjectReader.open() 82.76274 38,736 ms (82.8%) 38,736 ms
org.eclipse.jgit.internal.storage.file.WindowCursor.open() 82.76274 38,736 ms (82.8%) 38,736 ms
org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject() 82.76274 38,736 ms (82.8%) 38,736 ms
org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate() 82.76274 38,736 ms (82.8%) 38,736 ms
org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject() 82.76274 38,736 ms (82.8%) 38,736 ms
org.eclipse.jgit.internal.storage.file.PackFile.get() 82.76274 38,736 ms (82.8%) 38,736 ms
org.eclipse.jgit.internal.storage.file.PackFile.load() 82.76274 38,736 ms (82.8%) 38,736 ms
org.eclipse.jgit.internal.storage.file.PackFile.decompress() 79.77893 37,339 ms (79.8%) 37,339 ms
org.eclipse.jgit.internal.storage.file.WindowCursor.inflate() 79.77893 37,339 ms (79.8%) 37,339 ms
java.util.zip.Inflater.inflate() 61.16066 28,625 ms (61.2%) 28,625 ms
org.eclipse.jgit.internal.storage.file.WindowCursor.prepareInflater() 16.888159 7,904 ms (16.9%) 7,904 ms
java.util.zip.Inflater.reset() 16.888159 7,904 ms (16.9%) 7,904 ms
So 96.8% of the time is in parsing the revision, and it becomes really slow at a certain point. Hope that this information helps.
Are there any ideas how to fix the problem?
I am having this problem in 2022 with the latest version. It is using ~1 out of 32 cores, ~1GB RAM out of 64GB available, and <10%disk I/O. And the import has been running for 13 days so far. :D Is there a workaround for forcing it to be more parallel and/or use more RAM?
This repository hasn't changed since 2016, it's a miracle it still works! You could try reverting https://github.com/bozaro/git-lfs-migrate/commit/974270d180385acf4044b68557cbe8767ebf1ab4 yourself locally, or just using version 0.2.4. Or look at the 'network' around this repo to find forks which people have picked up and fixed/maintained!
It has been 5 years, I'm no longer certain which I did at the time, but have a vague recollection that the commit reverted cleanly!
@leth Thanks for attempting to help. This seems to be a part of the current official release of Git (I have git-lfs/3.0.2 (GitHub; windows amd64; go 1.17.2)), what would be the way of getting this change reverted in a new release?
Sorry, I have a contributor badge here because my PR was merged once, I have no permissions on this project!
It sounds like you'd need to find the project you downloaded git/github/git-lfs from and let them know they're bundling an unmaintained tool with known bugs 🤷🏻♂️