User description
Summary
- add OpenTelemetry-based telemetry module for agents-api
- init telemetry in web app startup
- capture workflow metrics in Temporal client
- increment query metrics via telemetry counters
- include OTel dependencies
Testing
ruff format .
ruff check . (auto-fixed)
pyright agents_api (fails: many missing import errors)
ward (fails: command not found)
PR Type
Enhancement
Description
-
Introduce OpenTelemetry-based telemetry module for agents-api
- Add
telemetry.py for tiered telemetry setup and management
- Support configuration via environment and YAML config
- Integrate crash, metrics, and latency monitoring
-
Integrate telemetry into workflow and query metrics
- Increment telemetry counters in Temporal workflow and query handlers
-
Initialize telemetry during FastAPI app startup
-
Add OpenTelemetry dependencies to project requirements
Changes walkthrough 📝
| Relevant files |
|---|
| Enhancement |
telemetry.pyAdd OpenTelemetry-based telemetry module with tiered levels
agents-api/agents_api/telemetry.py
Add new module for OpenTelemetry-based telemetry management Support tiered telemetry levels and persistent instance ID Integrate crash, metrics, and latency monitoring middleware Provide initialization function for FastAPI app
|
+155/-0 |
web.pyInitialize telemetry in FastAPI app startup
agents-api/agents_api/web.py
- Import and initialize telemetry during FastAPI app startup
|
+3/-0 |
temporal.pyIntegrate telemetry workflow counter in Temporal client
agents-api/agents_api/clients/temporal.py
Import telemetry module Increment workflow counter after starting workflow if telemetry enabled
|
+7/-1 |
counters.pyIntegrate telemetry queries counter in metrics decorators
agents-api/agents_api/metrics/counters.py
Import telemetry module Increment queries counter in metric decorators if telemetry enabled
|
+6/-0 |
|
| Dependencies |
pyproject.tomlAdd OpenTelemetry dependencies to project requirements
agents-api/pyproject.toml
Add OpenTelemetry SDK, exporter, and FastAPI instrumentation to dependencies
|
+3/-0 |
|
Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.Check out the documentation for more information.
[!IMPORTANT]
Add OpenTelemetry-based telemetry module to agents-api, capturing workflow and query metrics, and update dependencies.
- Telemetry Module:
- Adds
telemetry.py for OpenTelemetry-based telemetry in agents-api.
- Initializes telemetry in
web.py during app startup with init_telemetry().
- Metrics:
- Captures workflow metrics in
run_task_execution_workflow() in temporal.py.
- Increments query metrics in
counters.py using telemetry counters.
- Dependencies:
- Adds
opentelemetry-sdk, opentelemetry-exporter-otlp, and opentelemetry-instrumentation-fastapi to pyproject.toml.
This description was created by
for a63b63683cec2ee45bc1b38f461dda39704fb4b6. You can customize this summary. It will automatically update as commits are pushed.
PR Reviewer Guide 🔍
Here are some key observations to aid the review process:
| ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪ |
| 🧪 No relevant tests |
| 🔒 No security concerns identified |
⚡ Recommended focus areas for review
Error Handling
The _load_config function doesn't handle YAML parsing errors, which could cause the application to crash if the config file is malformed.
def _load_config() -> dict:
if CONFIG_PATH.exists():
with open(CONFIG_PATH) as f:
return yaml.safe_load(f) or {}
return {}
Resource Leak
The latency_middleware function creates a trace span but doesn't properly clean it up, potentially causing resource leaks in high-traffic scenarios.
@self.app.middleware("http")
async def latency_middleware(request: Request, call_next):
start = trace.get_current_span().context.trace_id
del start # AIDEV-NOTE: trace id not needed; using for type check
start_time = time.monotonic()
response = await call_next(request)
duration = time.monotonic() - start_time
latency_hist.record(duration, {"route": request.url.path})
return response
Configuration Validation
The telemetry level is cast to int without validation, which could cause crashes if non-integer values are provided in the environment variable or config file.
def _determine_level(self) -> tuple[int, str]:
level_env = os.getenv("JULEP_TELEMETRY")
config = _load_config()
if level_env is not None:
level = int(level_env)
else:
level = int(config.get("telemetry", DEFAULT_LEVEL))
|
PR Code Suggestions ✨
Explore these optional code suggestions:
| Category | Suggestion | Impact |
| Possible issue |
Add error handling
The code attempts to convert potentially non-integer values to integers without error handling. If the environment variable or config value contains non-integer data, this will crash. Add error handling to gracefully handle invalid values.
agents-api/agents_api/telemetry.py [56-62]
def _determine_level(self) -> tuple[int, str]:
level_env = os.getenv("JULEP_TELEMETRY")
config = _load_config()
- if level_env is not None:
- level = int(level_env)
- else:
- level = int(config.get("telemetry", DEFAULT_LEVEL))
+ try:
+ if level_env is not None:
+ level = int(level_env)
+ else:
+ level = int(config.get("telemetry", DEFAULT_LEVEL))
+ except (ValueError, TypeError):
+ logger.warning("Invalid telemetry level, using default")
+ level = DEFAULT_LEVEL
Suggestion importance[1-10]: 8
__
Why: Adding error handling for parsing the telemetry level prevents potential crashes from invalid configuration, which is important for robustness and reliability.
| Medium
|
| General |
Improve metrics documentation
The counters are created without any attributes or description, which limits their usefulness for analysis. Add descriptions and default attributes to provide context for the metrics.
agents-api/agents_api/telemetry.py [132-136]
def _setup_metrics(self) -> None:
meter = metrics.get_meter("julep.agents_api")
- self.queries_counter = meter.create_counter("julep.queries.count")
- self.workflows_counter = meter.create_counter("julep.workflows.count")
- latency_hist = meter.create_histogram("julep.request_latency.seconds")
+ self.queries_counter = meter.create_counter(
+ "julep.queries.count",
+ description="Number of queries processed"
+ )
+ self.workflows_counter = meter.create_counter(
+ "julep.workflows.count",
+ description="Number of workflows started"
+ )
+ latency_hist = meter.create_histogram(
+ "julep.request_latency.seconds",
+ description="HTTP request latency in seconds"
+ )
Suggestion importance[1-10]: 6
__
Why: Adding descriptions to metrics improves their clarity and usefulness for observability, but this is a moderate improvement rather than a critical fix.
| Low
|
Remove confusing variable usage
The code retrieves a trace ID but immediately deletes it with a comment indicating it's only for type checking. This is confusing and inefficient. Instead, directly check if tracing is active without creating unused variables.
agents-api/agents_api/telemetry.py [140-141]
-start = trace.get_current_span().context.trace_id
-del start # AIDEV-NOTE: trace id not needed; using for type check
+# Check if tracing is active without creating unused variables
+_ = trace.get_current_span().context.trace_id # Verify tracing is initialized
Suggestion importance[1-10]: 4
__
Why: The suggestion removes an unnecessary variable assignment and deletion, making the code slightly clearer, but the impact is minor as this is a non-functional change.
| Low
|
|
| |
CI Feedback 🧐
(Feedback updated until commit https://github.com/julep-ai/julep/commit/14bcb2276bce2066f704be94b6a95c1dee5310da)
A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
Action: Test |
|
Failed stage: Run tests [❌]
|
|
Failure summary:
The action failed due to an import error in the Python code. Specifically, in the file
/home/runner/work/julep/julep/agents-api/agents_api/telemetry.py at line 17, the code is trying to import OTLPMetricExporter from opentelemetry.sdk.metrics.export, but this name doesn't exist in that module. The error suggests that perhaps MetricExporter was the intended import instead.
|
Relevant error logs:
1: ##[group]Operating System
2: Ubuntu
...
965: prune-cache: true
966: ignore-nothing-to-cache: false
967: ##[endgroup]
968: Downloading uv from "https://github.com/astral-sh/uv/releases/download/0.7.5/uv-x86_64-unknown-linux-gnu.tar.gz" ...
969: [command]/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /home/runner/work/_temp/9cb436d6-652d-4695-a027-0255c75f7e09 -f /home/runner/work/_temp/5bc7a123-9d6a-4f66-9949-bde67355889b
970: Added /opt/hostedtoolcache/uv/0.7.5/x86_64 to the path
971: Added /home/runner/.local/bin to the path
972: Set UV_CACHE_DIR to /home/runner/work/_temp/setup-uv-cache
973: Successfully installed uv version 0.7.5
974: Searching files using cache dependency glob: **/uv.lock
975: /home/runner/work/julep/julep/agents-api/uv.lock
976: /home/runner/work/julep/julep/cli/uv.lock
977: /home/runner/work/julep/julep/integrations-service/uv.lock
978: Found 3 files to hash.
979: Trying to restore uv cache from GitHub Actions cache with key: setup-uv-1-x86_64-unknown-linux-gnu-0.7.5-d92603d25acef1c08e643c37cc2475e5e190deb9690356b084828d60043a591f
980: ##[warning]Failed to restore: Cache service responded with 422
981: No GitHub Actions cache found for key: setup-uv-1-x86_64-unknown-linux-gnu-0.7.5-d92603d25acef1c08e643c37cc2475e5e190deb9690356b084828d60043a591f
...
1460: from .protocol.tasks import ExecutionInput, StepContext, StepOutcome
1461: File "/home/runner/work/julep/julep/agents-api/agents_api/common/protocol/tasks.py", line 27, in <module>
1462: from ...activities.pg_query_step import pg_query_step
1463: File "/home/runner/work/julep/julep/agents-api/agents_api/activities/pg_query_step.py", line 6, in <module>
1464: from .. import queries
1465: File "/home/runner/work/julep/julep/agents-api/agents_api/queries/__init__.py", line 9, in <module>
1466: from . import agents as agents
1467: File "/home/runner/work/julep/julep/agents-api/agents_api/queries/agents/__init__.py", line 13, in <module>
1468: from .create_agent import create_agent
1469: File "/home/runner/work/julep/julep/agents-api/agents_api/queries/agents/create_agent.py", line 15, in <module>
1470: from ...metrics.counters import query_metrics
1471: File "/home/runner/work/julep/julep/agents-api/agents_api/metrics/counters.py", line 8, in <module>
1472: from ..telemetry import telemetry
1473: File "/home/runner/work/julep/julep/agents-api/agents_api/telemetry.py", line 17, in <module>
1474: from opentelemetry.sdk.metrics.export import (
1475: ImportError: cannot import name 'OTLPMetricExporter' from 'opentelemetry.sdk.metrics.export' (/home/runner/work/julep/julep/agents-api/.venv/lib/python3.12/site-packages/opentelemetry/sdk/metrics/export/__init__.py). Did you mean: 'MetricExporter'?
1476: ##[error]Process completed with exit code 1.
1477: Post job cleanup.
|