pygit2
pygit2 copied to clipboard
Excessive memory usage when accessing blob.size from a lot of blobs
trafficstars
When scanning a repo through such means:
todocommits = set()
for ref in repo.references:
ref = repo.references.get(ref)
todocommits.add(ref.peel(pygit2.Commit))
todotrees = set()
while todocommits:
c = todocommits.pop()
todotrees.add(c.tree)
todocommits.update(c.parents)
todoblobs = {}
while todotrees:
t = todotrees.pop()
for obj in t:
if isinstance(obj, pygit2.Blob):
blobmeta = todoblobs.setdefault(obj, [])
blobmeta += [(obj.filemode, obj.name)]
# obj.size
elif isinstance(obj, pygit2.Tree):
todotrees.add(obj)
else:
raise TypeError
# while todoblobs: ...
visiting obj.size at the given point is the difference between getting killed by the oom_killer vs using only about 160MiB of peak RAM.