julep icon indicating copy to clipboard operation
julep copied to clipboard

Add metrics scraping for additional services

Open creatorrr opened this issue 6 months ago • 3 comments
trafficstars

User description

Summary

  • include Temporal, LiteLLM and Traefik scrape jobs in Prometheus config
  • document new scrape targets in monitoring README

Testing

  • ruff format
  • ruff check

PR Type

enhancement, documentation


Description

  • Add Prometheus scrape jobs for Temporal, LiteLLM, and Traefik services.

  • Update Prometheus config to separate and clarify scrape targets.

  • Document new scrape targets in monitoring README.


Changes walkthrough 📝

Relevant files
Enhancement
prometheus.yml
Add and organize Prometheus scrape jobs for new services 

monitoring/prometheus/config/prometheus.yml

  • Added separate scrape jobs for Temporal, LiteLLM, and Traefik.
  • Removed Temporal from agents-api scrape targets.
  • Clarified and organized scrape_configs for better maintainability.
  • Included both standard and managed variants for new services.
  • +43/-9   
    Documentation
    README.md
    Document new Prometheus scrape targets in README                 

    monitoring/README.md

  • Documented new Prometheus scrape targets: Temporal, LiteLLM, Traefik.
  • Explained ports and purpose for each new target.
  • Clarified that dashboards will include new metrics automatically.
  • +10/-0   

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.

  • [!IMPORTANT] Add Prometheus scrape jobs for Temporal, LiteLLM, and Traefik, and update documentation.

    • Prometheus Configuration:
      • Add temporal scrape job in prometheus.yml for temporal:15000 and temporal-managed:15000.
      • Add litellm scrape job in prometheus.yml for litellm:4000 and litellm-managed:4000.
      • Add traefik scrape job in prometheus.yml for gateway:8082.
    • Documentation:
      • Update README.md to include new scrape targets: Temporal, LiteLLM, and Traefik.

    This description was created by Ellipsis for 2b778434a4c506e1c822dedff611edc3c4d0b0d6. You can customize this summary. It will automatically update as commits are pushed.

    creatorrr avatar May 20 '25 14:05 creatorrr

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Service Availability

    The PR adds new scrape targets for both standard and managed variants of services. Verify that all these services actually exist in the environment and are accessible at the specified ports.

        - targets: ['temporal:15000', 'temporal-managed:15000']
    
    # AIDEV-NOTE: LiteLLM metrics endpoint
    - job_name: litellm
      honor_timestamps: true
      scrape_interval: 5s
      scrape_timeout: 3s
      metrics_path: /metrics
      scheme: http
      follow_redirects: true
      static_configs:
        - targets: ['litellm:4000', 'litellm-managed:4000']
    

    CI Feedback 🧐

    A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

    Action: Typecheck

    Failed stage: Generate openapi code [❌]

    Failure summary:

    The action failed due to a dependency conflict in the npm packages. Specifically:

  • There's a conflict between different versions of @typespec/compiler:
    - The project requires
    @typespec/[email protected]
    - But [email protected] requires @typespec/compiler@^1.0.0

  • npm couldn't resolve this conflict and exited with error code 1 (lines 214-242)
  • The error suggests fixing the upstream dependency conflict or using --force or --legacy-peer-deps
    flags
  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    150:  prune-cache: true
    151:  ignore-nothing-to-cache: false
    152:  ##[endgroup]
    153:  Downloading uv from "https://github.com/astral-sh/uv/releases/download/0.7.6/uv-x86_64-unknown-linux-gnu.tar.gz" ...
    154:  [command]/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /home/runner/work/_temp/69e367f4-90bc-48d3-a96b-41a5d77ac065 -f /home/runner/work/_temp/0343e3e3-0156-402e-a48d-4ff68baedb8b
    155:  Added /opt/hostedtoolcache/uv/0.7.6/x86_64 to the path
    156:  Added /home/runner/.local/bin to the path
    157:  Set UV_CACHE_DIR to /home/runner/work/_temp/setup-uv-cache
    158:  Successfully installed uv version 0.7.6
    159:  Searching files using cache dependency glob: **/uv.lock
    160:  /home/runner/work/julep/julep/agents-api/uv.lock
    161:  /home/runner/work/julep/julep/cli/uv.lock
    162:  /home/runner/work/julep/julep/integrations-service/uv.lock
    163:  Found 3 files to hash.
    164:  Trying to restore uv cache from GitHub Actions cache with key: setup-uv-1-x86_64-unknown-linux-gnu-0.7.6-d92603d25acef1c08e643c37cc2475e5e190deb9690356b084828d60043a591f
    165:  ##[warning]Failed to restore: Cache service responded with 422
    166:  No GitHub Actions cache found for key: setup-uv-1-x86_64-unknown-linux-gnu-0.7.6-d92603d25acef1c08e643c37cc2475e5e190deb9690356b084828d60043a591f
    ...
    
    199:  npm warn   @typespec/compiler@"0.61.x" from the root project
    200:  npm warn   7 more (@typespec/events, @typespec/http, @typespec/openapi, ...)
    201:  npm warn
    202:  npm warn Could not resolve dependency:
    203:  npm warn peer @typespec/compiler@"^1.0.0" from @typespec/[email protected]
    204:  npm warn node_modules/@typespec/asset-emitter
    205:  npm warn   @typespec/asset-emitter@"^0.70.0" from [email protected]
    206:  npm warn   node_modules/typespec-openapi3-new
    207:  npm warn
    208:  npm warn Conflicting peer dependency: @typespec/[email protected]
    209:  npm warn node_modules/@typespec/compiler
    210:  npm warn   peer @typespec/compiler@"^1.0.0" from @typespec/[email protected]
    211:  npm warn   node_modules/@typespec/asset-emitter
    212:  npm warn     @typespec/asset-emitter@"^0.70.0" from [email protected]
    213:  npm warn     node_modules/typespec-openapi3-new
    214:  npm error code ERESOLVE
    215:  npm error ERESOLVE could not resolve
    216:  npm error
    217:  npm error While resolving: @typespec/[email protected]
    218:  npm error Found: @typespec/[email protected]
    219:  npm error node_modules/@typespec/compiler
    220:  npm error   @typespec/compiler@"0.61.x" from the root project
    221:  npm error   peer @typespec/compiler@"~0.61.0" from @typespec/[email protected]
    222:  npm error   node_modules/@typespec/events
    223:  npm error     @typespec/events@"0.61.x" from the root project
    224:  npm error     peer @typespec/events@"~0.61.0" from @typespec/[email protected]
    225:  npm error     node_modules/@typespec/sse
    226:  npm error       @typespec/sse@"0.61.x" from the root project
    227:  npm error   6 more (@typespec/http, @typespec/openapi, @typespec/openapi3, ...)
    228:  npm error
    229:  npm error Could not resolve dependency:
    230:  npm error peer @typespec/compiler@"^1.0.0" from [email protected]
    231:  npm error node_modules/typespec-http-new
    232:  npm error   typespec-http-new@"npm:@typespec/http@^1.0.1" from the root project
    233:  npm error
    234:  npm error Conflicting peer dependency: @typespec/[email protected]
    235:  npm error node_modules/@typespec/compiler
    236:  npm error   peer @typespec/compiler@"^1.0.0" from [email protected]
    237:  npm error   node_modules/typespec-http-new
    238:  npm error     typespec-http-new@"npm:@typespec/http@^1.0.1" from the root project
    239:  npm error
    240:  npm error Fix the upstream dependency conflict, or retry
    241:  npm error this command with --force or --legacy-peer-deps
    242:  npm error to accept an incorrect (and potentially broken) dependency resolution.
    243:  npm error
    244:  npm error
    245:  npm error For a full report see:
    246:  npm error /home/runner/.npm/_logs/2025-05-20T19_06_38_933Z-eresolve-report.txt
    247:  npm error A complete log of this run can be found in: /home/runner/.npm/_logs/2025-05-20T19_06_38_933Z-debug-0.log
    248:  ##[error]Process completed with exit code 1.
    249:  Post job cleanup.
    
    

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix service name reference

    The Traefik metrics endpoint is typically exposed at /metrics but the service
    name should be traefik instead of gateway to match Traefik's default container
    name in most deployments. This mismatch could prevent metrics collection.

    monitoring/prometheus/config/prometheus.yml [48-57]

     # AIDEV-NOTE: Traefik gateway metrics endpoint
     - job_name: traefik
       honor_timestamps: true
       scrape_interval: 5s
       scrape_timeout: 3s
       metrics_path: /metrics
       scheme: http
       follow_redirects: true
       static_configs:
    -    - targets: ['gateway:8082']
    +    - targets: ['traefik:8082']
    
    • [ ] Apply / Chat
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion correctly identifies a likely mismatch between the service name 'gateway' and the expected 'traefik' for the Traefik metrics endpoint, which could prevent Prometheus from scraping metrics. This is a moderate-impact fix that ensures correct metrics collection but does not address a critical bug or security issue.

    Medium
    • [ ] Update

    ⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

    Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

    🔎 Detected hardcoded secret in your pull request
    GitGuardian id GitGuardian status Secret Commit Filename
    17693055 Triggered JSON Web Token 3e9d8ae75a036a468f936c6183da4ce60eda3815 cli/tests/test_auth.py View secret
    🛠 Guidelines to remediate hardcoded secrets
    1. Understand the implications of revoking this secret by investigating where it is used in your code.
    2. Replace and store your secret safely. Learn here the best practices.
    3. Revoke and rotate this secret.
    4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

    To avoid such incidents in the future consider


    🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

    gitguardian[bot] avatar Jun 07 '25 07:06 gitguardian[bot]