augur [tasks] Fix memory issues from large in-memory lists in collection tasks

Description This PR addresses memory issues when collecting data from repositories with large datasets (10,000+ issues/PRs/contributors). Fixes #3404.

Key Changes:

Generator pattern for issues: Prevents loading all issues into memory at once
Batch processing: Insert data in 1000-item batches across all collection tasks
.clear() over reassignment: Reuses list objects instead of creating new ones, reducing GC pressure
Move inserts outside loops: In PR reviews, contributors and reviews are already in memory, so batching the final insert is safe and more efficient

Notes for Reviewers

All changes maintain existing logic—only optimization for memory efficiency
Batch size of 1000 balances memory usage vs. database round trips
PR reviews refactor moves inserts outside the loop: reduces N database operations to 1 bulk insert (safe since all_pr_reviews is already in memory)

Signed commits

[x] Yes, I signed my commits.

GenAI Disclosure: Claude Code was used to generate this PR draft and review diff changes for logical correctness and potential performance issues.

Nov 20 '25 16:11 shlokgilda

It's a slightly longer PR, but I think important and within scope since all these issues were possible causes of OOM exceptions.

Nov 20 '25 16:11 shlokgilda

@shlokgilda has been thoroughly testing this and has confirmed that the facade workers are flowing and secondary is not memory bottlenecked anymore.

I trust this as far as testing goes and am going to mark this as ready

Dec 03 '25 19:12 MoralCode

I think I was the one who removed that import because pylint suggested it was unused. I haven't personally tested that change so prob worth reverting just in case.

Dec 06 '25 20:12 MoralCode

rebased, fixed the merge conflict with the string fields fix (#3434) and corrected my pylint bug

Dec 11 '25 14:12 MoralCode