go-git
go-git copied to clipboard
High memory usage iterating over commits of large repository (store on filesystem)
Hi there, I apologize ahead of time if this issue is not directly related to a problem within this codebase, I'm unsure of the source of the problem I'm experiencing and greatly appreciate any guidance!
I'm working on a project (https://github.com/augmentable-dev/tickgit) that traverses the git history of a repository looking for when certain lines of source were added (TODO items), using the iterators provided by go-git
. For smaller repositories, everything works great. For larger ones, such as https://github.com/torvalds/linux (reading from a clone on my filesystem), I see extremely high memory consumption when iterating through commits (and inspecting their trees). I assume a memory leak...but am having trouble identifying the source - I'm a profiling n00b!
gopkg.in/src-d/go-git.v4/plumbing/format/idxfile.(*MemoryIndex).genOffsetHash
/Users/patrickdevivo/go/pkg/mod/gopkg.in/src-d/[email protected]/plumbing/format/idxfile/idxfile.go
Total: 510MB 510MB (flat, cum) 59.24%
199 . . count, err := idx.Count()
200 . . if err != nil {
201 . . return err
202 . . }
203 . .
204 510MB 510MB idx.offsetHash = make(map[int64]plumbing.Hash, count)
205 . . idx.offsetHashIsFull = true
206 . .
207 . . var hash plumbing.Hash
208 . . i := uint32(0)
209 . . for firstLevel, fanoutValue := range idx.Fanout {
I've gone through my code several times now looking for instances where I might be unnecessarily holding onto memory, but am wondering if there's perhaps some case of me misusing the library or not cleaning something up. I'm pretty sure I'm closing all my file readers, is this something anyone else has encountered and may know where the issue lies? Thanks so much