potpie icon indicating copy to clipboard operation
potpie copied to clipboard

Parsing improvements: Speed up parsing for new branches / Improve KG for large files

Open dhirenmathur opened this issue 3 months ago • 5 comments

Summary by CodeRabbit

  • New Features

    • Global inference cache shared across projects for faster repeat processing.
    • Docstring requests now accept an optional metadata field.
  • Performance Improvements

    • Cache-aware batching, improved tokenization, and smarter reference resolution.
    • Chunking and consolidation for large nodes to stay within token limits.
  • Background Tasks

    • Automated cache cleanup (expired and least-accessed) with stats reporting.
  • Database

    • Added inference cache table and indexes; removed project linkage to enable global caching.

dhirenmathur avatar Sep 30 '25 12:09 dhirenmathur

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title highlights parsing performance and knowledge‐graph improvements, which correspond to some of the changes such as chunk handling for large files, but it does not reference the major new inference caching system and related migrations that constitute the bulk of the PR. It therefore captures only part of the changes and is overly broad about the actual scope. The phrasing is clear and descriptive of one aspect but misses the primary functional addition. As a result, the title is only partially related to the full changeset.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch cache_parse

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c950f83bca24a0066016bd2083672977b63bd435 and 99084e2ca8f7542830d37b1ae512010965f222e1.

📒 Files selected for processing (7)
  • app/alembic/versions/20250928_simple_global_cache.py (1 hunks)
  • app/modules/parsing/knowledge_graph/inference_service.py (8 hunks)
  • app/modules/parsing/services/cache_cleanup_service.py (1 hunks)
  • app/modules/parsing/services/inference_cache_service.py (1 hunks)
  • app/modules/parsing/tasks/cache_cleanup_tasks.py (1 hunks)
  • app/modules/parsing/utils/cache_diagnostics.py (1 hunks)
  • potpie-ui (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
app/modules/parsing/services/inference_cache_service.py (1)
app/modules/parsing/models/inference_cache_model.py (1)
  • InferenceCache (16-35)
app/modules/parsing/knowledge_graph/inference_service.py (6)
app/core/database.py (1)
  • get_db (29-34)
app/modules/parsing/knowledge_graph/inference_schema.py (3)
  • DocstringRequest (6-9)
  • DocstringResponse (18-19)
  • DocstringNode (12-15)
app/modules/parsing/services/inference_cache_service.py (3)
  • InferenceCacheService (10-148)
  • get_cached_inference (14-50)
  • store_inference (52-128)
app/modules/parsing/utils/content_hash.py (2)
  • generate_content_hash (6-26)
  • is_content_cacheable (29-49)
app/modules/parsing/utils/cache_diagnostics.py (2)
  • analyze_cache_misses (10-106)
  • log_diagnostics_summary (109-146)
app/modules/projects/projects_service.py (1)
  • get_project_from_db_by_id_sync (220-233)
app/modules/parsing/services/cache_cleanup_service.py (1)
app/modules/parsing/models/inference_cache_model.py (1)
  • InferenceCache (16-35)
app/alembic/versions/20250928_simple_global_cache.py (1)
app/alembic/versions/20250923_add_inference_cache_table.py (2)
  • upgrade (20-63)
  • downgrade (66-73)
app/modules/parsing/utils/cache_diagnostics.py (2)
app/modules/parsing/models/inference_cache_model.py (1)
  • InferenceCache (16-35)
app/modules/parsing/utils/content_hash.py (1)
  • generate_content_hash (6-26)
app/modules/parsing/tasks/cache_cleanup_tasks.py (2)
app/core/database.py (1)
  • get_db (29-34)
app/modules/parsing/services/cache_cleanup_service.py (4)
  • CacheCleanupService (11-90)
  • cleanup_expired_entries (31-43)
  • cleanup_least_accessed (45-71)
  • get_cleanup_stats (73-90)
🪛 Ruff (0.13.3)
app/modules/parsing/services/inference_cache_service.py

15-15: Unused method argument: project_id

(ARG002)


126-126: Use raise without specifying exception name

Remove exception name

(TRY201)

app/modules/parsing/knowledge_graph/inference_service.py

379-379: Unused method argument: project_id

(ARG002)


398-398: Do not catch blind exception: Exception

(BLE001)


615-615: Do not catch blind exception: Exception

(BLE001)


849-849: Do not catch blind exception: Exception

(BLE001)


926-926: Do not catch blind exception: Exception

(BLE001)


980-980: Do not catch blind exception: Exception

(BLE001)


990-990: Consider moving this statement to an else block

(TRY300)


992-992: Do not catch blind exception: Exception

(BLE001)


993-993: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

app/modules/parsing/utils/cache_diagnostics.py

91-91: Do not catch blind exception: Exception

(BLE001)


92-92: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

app/modules/parsing/tasks/cache_cleanup_tasks.py

21-21: Consider moving this statement to an else block

(TRY300)


46-46: Consider moving this statement to an else block

(TRY300)


67-67: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (1)
potpie-ui (1)

1-1: Confirm the potpie-ui submodule update.

Only the pointer changed, so I can’t review the underlying UI code here. Please confirm the linked commit was vetted (tests, lint, sanity checks) or share a brief summary of what it contains so we know the update is intentional.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Sep 30 '25 12:09 coderabbitai[bot]

@coderabbitai generate docstring

dhirenmathur avatar Sep 30 '25 13:09 dhirenmathur

[!TIP] For best results, initiate chat on the files or code changes.

Task: Generate docstrings for PR #452

I'll generate comprehensive docstrings for all public classes, methods, and functions across the new inference caching system to improve coverage from 58.14% to meet the 80% threshold.

Files requiring docstrings:

  1. app/modules/parsing/services/inference_cache_service.py - InferenceCacheService class and methods
  2. app/modules/parsing/services/cache_cleanup_service.py - CacheCleanupService class and methods
  3. app/modules/parsing/knowledge_graph/inference_service.py - Updated methods and new helper functions
  4. app/modules/parsing/utils/cache_diagnostics.py - Diagnostic utility functions
  5. app/modules/parsing/utils/content_hash.py - Hash generation utilities
  6. app/modules/parsing/models/inference_cache_model.py - ORM model class
  7. app/modules/parsing/tasks/cache_cleanup_tasks.py - Celery task functions
  8. app/alembic/versions/ - Migration scripts

@dhirenmathur I'll generate comprehensive docstrings across all 8 files in the inference caching system. The docstrings will follow Google-style format and cover:

  • Service classes: InferenceCacheService and CacheCleanupService with all methods
  • Updated inference logic: New cache-aware methods in InferenceService
  • Utility functions: Cache diagnostics and content hashing utilities
  • ORM model: InferenceCache table documentation
  • Celery tasks: Cache cleanup periodic tasks
  • Migration scripts: Database schema change documentation

This will bring your docstring coverage from 58.14% to well above the 80% threshold. The docstrings will include detailed descriptions, parameter specifications, return values, and any raised exceptions.


⏳ Preparing changes...

coderabbitai[bot] avatar Sep 30 '25 13:09 coderabbitai[bot]

[!CAUTION] The CodeRabbit agent's plans did not produce any file changes.

coderabbitai[bot] avatar Sep 30 '25 14:09 coderabbitai[bot]