llama-stack
llama-stack copied to clipboard
create a span when a client or server tool is called
🚀 Describe the new functionality needed
I would like to use OpenTelemetry-Tracing to track my agent's activities. I created an agent as follows and registered a local 'calculator' and 'builtin::websearch' configured on llama-stack.
Agent
agent = Agent(
client,
model=MODEL_ID,
instructions="You are a helpful assistant. Use tools when necessary.",
sampling_params={
"strategy": {"type": "top_p", "temperature": 1.0, "top_p": 0.9},
},
tools=[calculator, "builtin::websearch"],
)
Calculator
@client_tool
def calculator(x: float, y: float, operation: str) -> dict:
"""
Perform a basic arithmetic operation on two numbers.
:param x: First number
:param y: Second number
:param operation: The operation to perform: 'add', 'subtract', 'multiply', or 'divide'
:returns: A dictionary with keys 'success' and either 'result' or 'error'
"""
with tracer.start_as_current_span("tool.calculator") as span:
span.set_attribute("calculator.x", x)
span.set_attribute("calculator.y", y)
span.set_attribute("calculator.operation", operation)
print(f"Call calculator: {x} {operation}, {y}", file=sys.stdout, flush=True)
try:
if operation == "add":
result = x + y
elif operation == "subtract":
result = x - y
elif operation == "multiply":
result = x * y
elif operation == "divide":
if y == 0:
error_msg = "Cannot divide by zero"
span.set_attribute("calculator.error", error_msg)
return {"success": False, "error": error_msg}
result = x / y
else:
error_msg = "Invalid operation"
span.set_attribute("calculator.error", error_msg)
return {"success": False, "error": error_msg}
span.set_attribute("calculator.result", result)
return {"success": True, "result": result}
except Exception as e:
error_msg = str(e)
span.set_attribute("calculator.exception", error_msg)
return {"success": False, "error": error_msg}
The agent offers a OpenAI compatible endpoint, thats why I then can do a turn using curl:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "vllm",
"messages": [
{"role": "user", "content": "What is 40+30?"}
]
}'
This results in a calculator call and the following response:
{"id":"chatcmpl-1234","object":"chat.completion","created":1747758297,"model":"vllm","choices":[{"index":0,"message":{"role":"assistant","content":"[{\"name\": \"calculator\", \"arguments\": {\"x\": 40, \"y\": 30, \"operation\": \"add\"}}]"}
Inspecting the llama-stack traces, I dont see a span indicating that the calculator or the websearch have been triggered.
Creating a seperate span for each tool execution would help to understand the flow of the execution.
💡 Why is this needed? What if we don't build it?
I identify where in the LlamaStack tools are called and create a separate OpenTelemetry span for these operations.
Other thoughts
It would be even better when the tool exeuction on llama-stack would propagate the trace context to the tool that is called. I assume that belongs somehow to:
- https://github.com/meta-llama/llama-stack/issues/2154