Add LLM usage and retry tracking for indexing stage

Open june616 opened this issue 2 months ago • 1 comments

Description

Add comprehensive LLM usage and retry tracking for the indexing stage, providing complete performance observability.

Proposed Changes

Data Structure Extensions

Added to PipelineRunStats:
- total_llm_retries: Total retry attempts across all workflows
- llm_usage_by_workflow[workflow]["retries"]: Per-workflow retry count

Context Injection Mechanism

Added inject_llm_context() helper function
Centralized context injection in run_pipeline.py
Propagated through ModelManager to all LLM models

Retry Tracking

Added _record_retries() common method to Retry base class
All retry strategies (Exponential, Native, Random, Incremental) record uniformly
Used finally blocks to ensure both successful and failed retries are tracked

Enhanced Logging

Output LLM usage (including retries) after each workflow
Output total statistics after pipeline completion
Added exception logging for context injection failures

Sample output in stats.json:

{
  "total_llm_calls": 20,
  "total_prompt_tokens": 104652,
  "total_completion_tokens": 9691,
  "total_llm_retries": 8,
  "llm_usage_by_workflow": {
    "extract_graph": {
      "llm_calls": 5,
      "prompt_tokens": 66766,
      "completion_tokens": 5757,
      "retries": 6
    }
  }
}

Checklist

I've validated the functionality with end-to-end indexing runs.

[x] I have tested these changes locally.
[x] I have reviewed the code changes.
[x] I have updated the documentation (if necessary).
[N/A] I have added appropriate unit tests (if applicable).

Note: Both Linux and Windows smoke tests are failing with the same root cause: "ValidationError: API Key is required for chat when using api_key authentication". My changes do not affect configuration validation or authentication logic.

Additional Notes

[Add any additional notes or context that may be helpful for the reviewer(s).]

Oct 16 '25 17:10 june616

@microsoft-github-policy-service agree company="Microsoft"

Oct 16 '25 19:10 june616

Add LLM usage and retry tracking for indexing stage

Description

Related Issues

Proposed Changes

Checklist

Additional Notes