augur [tasks/github] Batch processing for PR review comments collection

Description

Adds batched processing to collect_pull_request_review_comments to reduce memory usage
Processes comments in batches of 1000 instead of accumulating all in memory before insertion
Combines contributor and comment extraction into a single pass (was two separate loops)
Extracts shared _flush_contributors helper function for code reuse between PR reviews and PR review comments
Adds defensive batch trigger that checks both pr_review_comment_dicts and contributors list sizes to prevent unbounded memory growth in edge cases

Dependencies

This PR should be merged after #3439 as it builds on that branch

Notes for Reviewers

Memory impact: For a repo with many PR review comments, old code loaded all comments into memory via list(github_data_access.paginate_resource(...)). New code streams from the generator and caps batches at ~1000.
Follows the same batching pattern used in collect_pull_request_reviews from #3439
The _flush_contributors helper is now shared between both PR reviews and PR review comments flush functions for consistency

Testing

Tested this code with few larger repos (>50K PRs/issues). Works fine.

Signed commits

[x] Yes, I signed my commits.

AI Disclosure: I used Claude Code to write this PR draft and generate docstrings.

Dec 09 '25 19:12 shlokgilda

converting to draft because of

This PR should be merged after https://github.com/chaoss/augur/pull/3439 as it builds on that branch

Dec 09 '25 21:12 MoralCode

@MoralCode : I really appreciate the safety move of switching this to draft so somebody (me) doesn't merge these in the wrong order. :)

Dec 09 '25 22:12 sgoggins

I ran this against tensorflow/tensorflow (75K+ PRs) and a few other large repos. The few things I validated:

No errors during collection (workers completed successfully)
Database values matched expected counts (compared PR count vs reviews inserted)
Memory usage stayed stable (no gradual climb like before)

The dict comprehension part is actually pretty straightforward - it's just the two-pass loop collapsed into one. Instead of looping through reviews twice (once for contributors, once for review data), we do both extractions in the same pass.

Dec 20 '25 16:12 shlokgilda