kaito icon indicating copy to clipboard operation
kaito copied to clipboard

feat: RAG metrics part 1

Open bangqipropel opened this issue 8 months ago • 14 comments

Reason for Change:

Requirements

  • [ ] added unit tests and e2e tests (if applicable).

Issue Fixed:

Notes for Reviewers:

bangqipropel avatar Apr 04 '25 10:04 bangqipropel

Title

(Describe updated until commit https://github.com/kaito-project/kaito/commit/e4a0576f086ae9a9b175bd52d688fbd5c625713a)

Add Metrics Collection for RAG Engine


Description

  • Added metrics collection for embedding and API requests.

  • Introduced Prometheus metrics for monitoring.

  • Enhanced error handling and logging for metrics recording.


Changes walkthrough 📝

Relevant files
Enhancement
huggingface_local_embedding.py
Add embedding metrics                                                                       

presets/ragengine/embedding/huggingface_local_embedding.py

  • Imported necessary modules for metrics.
  • Added _embed_with_retry method for embedding with metrics.
  • Added error handling and latency recording.
  • +35/-1   
    remote_embedding.py
    Add remote embedding metrics                                                         

    presets/ragengine/embedding/remote_embedding.py

  • Imported record_embedding_metrics.
  • Added @record_embedding_metrics decorator to _get_text_embedding.
  • Updated _get_query_embedding to use _get_text_embedding.
  • +3/-1     
    main.py
    Add API metrics and Prometheus endpoint                                   

    presets/ragengine/main.py

  • Imported Prometheus client and metrics.
  • Added /metrics endpoint for exposing Prometheus metrics.
  • Added timing and metrics recording for various API endpoints.
  • +125/-10
    helpers.py
    Add embedding metrics helper                                                         

    presets/ragengine/metrics/helpers.py

  • Added record_embedding_metrics decorator for synchronous functions.
  • Implemented metrics recording for success and failure cases.
  • +43/-0   
    prometheus_metrics.py
    Define Prometheus metrics                                                               

    presets/ragengine/metrics/prometheus_metrics.py

  • Defined Prometheus metrics for embedding and API requests.
  • Included labels for status and mode.
  • +49/-0   
    Documentation
    __init__.py
    Initialize metrics package                                                             

    presets/ragengine/metrics/init.py

    • Added package initialization for metrics.
    +6/-0     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • kaito-pr-agent[bot] avatar Apr 04 '25 10:04 kaito-pr-agent[bot]

    PR Reviewer Guide 🔍

    (Review updated until commit https://github.com/kaito-project/kaito/commit/e4a0576f086ae9a9b175bd52d688fbd5c625713a)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Incorrect Metric Recording

    The decorator record_embedding_metrics incorrectly records the same metric labels for both success and failure cases in the exception handling block.

    # Record failed embedding
    rag_embedding_requests_total.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).inc()
    # Record latency even for failures
    rag_embedding_latency.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).observe(time.time() - start_time)
    
    Typo in Label

    There is a typo in the label name rag_indexes_document_requests_totall. It should be rag_indexes_document_requests_total.

    rag_indexes_document_requests_total = Counter('rag_indexes_document_requests_totall', 'Count of successful/failed calling \'/indexes/{index_name}/documents\' requests', labelnames=[STATUS_LABEL])
    

    kaito-pr-agent[bot] avatar Apr 04 '25 10:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/239cc9b26b2b7efd4b32b981c7f1e2e96dce378f

    kaito-pr-agent[bot] avatar Apr 04 '25 10:04 kaito-pr-agent[bot]

    PR Code Suggestions ✨

    Latest suggestions up to e4a0576

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix incorrect status in exception

    Update status to failure when recording metrics in exception handling.

    presets/ragengine/metrics/helpers.py [37-40]

     except Exception as e:
         # Record failed embedding
    -    rag_embedding_requests_total.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).inc()
    +    rag_embedding_requests_total.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).inc()
         # Record latency even for failures
    -    rag_embedding_latency.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).observe(time.time() - start_time)
    +    rag_embedding_latency.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).observe(time.time() - start_time)
         raise e
    
    Suggestion importance[1-10]: 8

    __

    Why: This suggestion fixes a critical bug in the exception handling by updating the status to failure when recording metrics, which is important for accurate monitoring.

    Medium
    Correctly re-raise exception

    Ensure the exception is re-raised after recording the failure status.

    presets/ragengine/embedding/huggingface_local_embedding.py [33-34]

     except Exception as e:
         status = STATUS_FAILURE
    -    raise e
    +    raise
    
    Suggestion importance[1-10]: 3

    __

    Why: The suggestion correctly modifies the exception re-raising mechanism, but it's a minor improvement and doesn't address a critical issue.

    Low
    Fix metric name typo

    Correct the typo in the metric name.

    presets/ragengine/metrics/prometheus_metrics.py [41]

    -rag_indexes_document_requests_total = Counter('rag_indexes_document_requests_totall', 'Count of successful/failed calling \'/indexes/{index_name}/documents\' requests', labelnames=[STATUS_LABEL])
    +rag_indexes_document_requests_total = Counter('rag_indexes_document_requests_total', 'Count of successful/failed calling \'/indexes/{index_name}/documents\' requests', labelnames=[STATUS_LABEL])
    
    Suggestion importance[1-10]: 3

    __

    Why: The suggestion corrects a typo in the metric name, which is necessary for proper metric tracking but is not a critical issue.

    Low

    Previous suggestions

    Suggestions up to commit e4a0576
    CategorySuggestion                                                                                                                                    Impact
    General
    Implement dynamic calculation for disk storage

    Ensure that DiskStorageRequirement is calculated dynamically based on the formula
    provided in the comment to avoid hardcoding values.

    pkg/model/interface.go [97]

     +DiskStorageRequirement string // Disk storage requirements for the model.
    ++// DiskStorageRequirement is calculated as:
    ++// (TotalGPUMemoryRequirement × 2.5 + 48) rounded up to the next multiple of 10.
    ++// This formula accounts for model weights, optimization files, and runtime overhead.
    ++// Example: For a 14Gi model, calculation is: 14 × 2.5 + 48 = 83, rounded up to 90Gi.
    ++
    ++func (p *PresetParam) CalculateDiskStorageRequirement() {
    ++    totalGPUMemory, _ := strconv.ParseFloat(p.TotalGPUMemoryRequirement, 64)
    ++    p.DiskStorageRequirement = fmt.Sprintf("%.0fGi", math.Ceil((totalGPUMemory*2.5+48)/10)*10)
    ++}
    
    Suggestion importance[1-10]: 8

    __

    Why: Dynamically calculating DiskStorageRequirement ensures accuracy and reduces the risk of hardcoded values becoming outdated.

    Medium
    Confirm disk storage calculation

    Verify that DiskStorageRequirement values are correctly calculated and updated
    according to the formula provided in the comments.

    presets/workspace/models/falcon/model.go [165]

    --DiskStorageRequirement:    "400",
     +DiskStorageRequirement:    "280Gi",
    
    Suggestion importance[1-10]: 5

    __

    Why: While the value is correct, verifying calculations ensures consistency across different models and future updates.

    Low
    Security
    Test new GPU provisioner version

    Ensure compatibility and test the new GPU provisioner version thoroughly before
    deployment.

    Makefile [6]

    --GPU_PROVISIONER_VERSION ?= 0.3.3
     +GPU_PROVISIONER_VERSION ?= 0.3.4
    
    Suggestion importance[1-10]: 7

    __

    Why: Testing the new version ensures compatibility and stability before deployment, which is important for security and functionality.

    Medium
    Suggestions up to commit e4a0576
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Correct status on exception

    Update the status to failure when an exception occurs.

    presets/ragengine/metrics/helpers.py [37-40]

     except Exception as e:
         # Record failed embedding
    -    rag_embedding_requests_total.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).inc()
    +    rag_embedding_requests_total.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).inc()
         # Record latency even for failures
    -    rag_embedding_latency.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).observe(time.time() - start_time)
    +    rag_embedding_latency.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).observe(time.time() - start_time)
         raise e
    
    Suggestion importance[1-10]: 8

    __

    Why: This suggestion addresses a critical issue by ensuring the correct status is recorded when an exception occurs.

    Medium
    Correct metric name

    Fix typo in metric name.

    presets/ragengine/metrics/prometheus_metrics.py [41]

    -rag_indexes_document_requests_total = Counter('rag_indexes_document_requests_totall', 'Count of successful/failed calling \'/indexes/{index_name}/documents\' requests', labelnames=[STATUS_LABEL])
    +rag_indexes_document_requests_total = Counter('rag_indexes_document_requests_total', 'Count of successful/failed calling \'/indexes/{index_name}/documents\' requests', labelnames=[STATUS_LABEL])
    
    Suggestion importance[1-10]: 8

    __

    Why: Fixing a typo in the metric name is important for accurate metrics collection and should be addressed.

    Medium
    Correctly re-raise exception

    Ensure that the exception is re-raised after recording the failure status.

    presets/ragengine/embedding/huggingface_local_embedding.py [32-34]

     except Exception as e:
         status = STATUS_FAILURE
    -    raise e
    +    raise
    
    Suggestion importance[1-10]: 2

    __

    Why: The suggestion is correct but only asks the user to verify or ensure a change, which reduces its score.

    Low
    Suggestions up to commit a3420b2
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix incorrect status label

    Correct the status label to STATUS_FAILURE when recording failed embeddings.

    presets/ragengine/metrics/helpers.py [37-40]

     except Exception as e:
         # Record failed embedding
    -    rag_embedding_requests_total.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).inc()
    +    rag_embedding_requests_total.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).inc()
         # Record latency even for failures
    -    rag_embedding_latency.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).observe(time.time() - start_time)
    +    rag_embedding_latency.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).observe(time.time() - start_time)
         raise e
    
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion correctly identifies and fixes an issue where the status label is incorrectly set to STATUS_SUCCESS instead of STATUS_FAILURE when recording failed embeddings. This is a critical bug fix that ensures accurate metric reporting.

    Medium
    Suggestions up to commit a3420b2
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix incorrect status on exception

    Update the status to failure when an exception occurs.

    presets/ragengine/metrics/helpers.py [37-40]

     except Exception as e:
         # Record failed embedding
    -    rag_embedding_requests_total.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).inc()
    +    rag_embedding_requests_total.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).inc()
         # Record latency even for failures
    -    rag_embedding_latency.labels(status=STATUS_SUCCESS, mode=MODE_REMOTE).observe(time.time() - start_time)
    +    rag_embedding_latency.labels(status=STATUS_FAILURE, mode=MODE_REMOTE).observe(time.time() - start_time)
         raise e
    
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion correctly updates the status to failure when an exception occurs, which is a critical fix for accurate metrics collection.

    Medium
    Fix metric name typo

    Correct the typo in the metric name.

    presets/ragengine/metrics/prometheus_metrics.py [41]

    -rag_indexes_document_requests_total = Counter('rag_indexes_document_requests_totall', 'Count of successful/failed calling \'/indexes/{index_name}/documents\' requests', labelnames=[STATUS_LABEL])
    +rag_indexes_document_requests_total = Counter('rag_indexes_document_requests_total', 'Count of successful/failed calling \'/indexes/{index_name}/documents\' requests', labelnames=[STATUS_LABEL])
    
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion fixes a typo in the metric name, which is important for correct metric tracking and reporting.

    Medium

    kaito-pr-agent[bot] avatar Apr 04 '25 10:04 kaito-pr-agent[bot]

    Codecov Report

    Attention: Patch coverage is 82.40000% with 22 lines in your changes missing coverage. Please review.

    @@            Coverage Diff             @@
    ##             main     #988      +/-   ##
    ==========================================
    + Coverage   58.91%   59.28%   +0.36%     
    ==========================================
      Files          69       71       +2     
      Lines        7507     7623     +116     
    ==========================================
    + Hits         4423     4519      +96     
    - Misses       2865     2885      +20     
      Partials      219      219              
    
    Components Coverage Δ
    workspace 52.48% <ø> (ø)
    presets 87.41% <82.40%> (-0.35%) :arrow_down:
    main ∅ <ø> (∅)
    :rocket: New features to boost your workflow:
    • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

    codecov[bot] avatar Apr 04 '25 11:04 codecov[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/12e32317ec89f44cd014cfbd2cf901a77d618d68

    kaito-pr-agent[bot] avatar Apr 04 '25 19:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/d9fdc7faf8d45a0ac24d34ff6e3c5d12abb6f435

    kaito-pr-agent[bot] avatar Apr 04 '25 19:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/69bef40213de6189421af3a8bc62b944c5c33461

    kaito-pr-agent[bot] avatar Apr 04 '25 20:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/6fe0998be91558fcd9e7a9f5237d8c5283dfb02b

    kaito-pr-agent[bot] avatar Apr 04 '25 20:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/17fffb46dd776010703f55f56897861e4e5dd5ec

    kaito-pr-agent[bot] avatar Apr 04 '25 23:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/03781c0ec647bd209e68f52278f4a987b3478e92

    kaito-pr-agent[bot] avatar Apr 05 '25 07:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/14817dd1808598f6321e62dcd0c5209efc1ec74c

    kaito-pr-agent[bot] avatar Apr 05 '25 07:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/c9c2f1f87a421df0c19816d7c00d220bcc3a85a7

    kaito-pr-agent[bot] avatar Apr 05 '25 08:04 kaito-pr-agent[bot]

    Persistent review updated to latest commit https://github.com/kaito-project/kaito/commit/05614c6b36c57e355450f71752635ec26c7bb5df

    kaito-pr-agent[bot] avatar Apr 05 '25 08:04 kaito-pr-agent[bot]

    prometheus client or other requirements -> requirements.txt

    ishaansehgal99 avatar May 08 '25 00:05 ishaansehgal99