daniel-salib
daniel-salib
accurate realtime request concurrency tracking. Added the /load api to retrieve the realtime concurrency count. Benchmark latency_results.json with the load tracking: ` { "avg_latency": 0.21240155203461958, "latencies": [ 0.21297942200908437, 0.2120011480001267, 0.2135092600074131,...
the background task to decrement server_load can only trigger if there's a response. If the connection is terminated (i.e. canceled or timeout), then we need to ensure server load is...
## Purpose This change enables streaming support for MCP tools when using GPT OSS. It extends the harmony utilities and response serving infrastructure to handle tool streaming, allowing tool calls...