augur icon indicating copy to clipboard operation
augur copied to clipboard

[tasks] Fix memory issues from large in-memory lists in collection tasks

Open shlokgilda opened this issue 1 month ago • 4 comments

Description This PR addresses memory issues when collecting data from repositories with large datasets (10,000+ issues/PRs/contributors). Fixes #3404.

Key Changes:

  • Generator pattern for issues: Prevents loading all issues into memory at once
  • Batch processing: Insert data in 1000-item batches across all collection tasks
  • .clear() over reassignment: Reuses list objects instead of creating new ones, reducing GC pressure
  • Move inserts outside loops: In PR reviews, contributors and reviews are already in memory, so batching the final insert is safe and more efficient

Notes for Reviewers

  • All changes maintain existing logic—only optimization for memory efficiency
  • Batch size of 1000 balances memory usage vs. database round trips
  • PR reviews refactor moves inserts outside the loop: reduces N database operations to 1 bulk insert (safe since all_pr_reviews is already in memory)

Signed commits

  • [x] Yes, I signed my commits.

GenAI Disclosure: Claude Code was used to generate this PR draft and review diff changes for logical correctness and potential performance issues.

shlokgilda avatar Nov 20 '25 16:11 shlokgilda

It's a slightly longer PR, but I think important and within scope since all these issues were possible causes of OOM exceptions.

shlokgilda avatar Nov 20 '25 16:11 shlokgilda

@shlokgilda has been thoroughly testing this and has confirmed that the facade workers are flowing and secondary is not memory bottlenecked anymore.

I trust this as far as testing goes and am going to mark this as ready

MoralCode avatar Dec 03 '25 19:12 MoralCode

I think I was the one who removed that import because pylint suggested it was unused. I haven't personally tested that change so prob worth reverting just in case.

MoralCode avatar Dec 06 '25 20:12 MoralCode

rebased, fixed the merge conflict with the string fields fix (#3434) and corrected my pylint bug

MoralCode avatar Dec 11 '25 14:12 MoralCode