pygit2 icon indicating copy to clipboard operation
pygit2 copied to clipboard

Excessive memory usage when accessing blob.size from a lot of blobs

Open SoniEx2 opened this issue 3 years ago • 0 comments
trafficstars

When scanning a repo through such means:

todocommits = set()

for ref in repo.references:
    ref = repo.references.get(ref)
    todocommits.add(ref.peel(pygit2.Commit))

todotrees = set()

while todocommits:
    c = todocommits.pop()
    todotrees.add(c.tree)
    todocommits.update(c.parents)

todoblobs = {}

while todotrees:
    t = todotrees.pop()
    for obj in t:
        if isinstance(obj, pygit2.Blob):
            blobmeta = todoblobs.setdefault(obj, [])
            blobmeta += [(obj.filemode, obj.name)]
            # obj.size
        elif isinstance(obj, pygit2.Tree):
            todotrees.add(obj)
        else:
            raise TypeError

# while todoblobs: ...

visiting obj.size at the given point is the difference between getting killed by the oom_killer vs using only about 160MiB of peak RAM.

SoniEx2 avatar Feb 04 '22 19:02 SoniEx2