daniel-salib comments

Results 15 comments of


                                            daniel-salib

[gpt-oss] support mcp tool streaming with gptoss

@bbrowning thanks for catching that - yeah I saw function calls were using it so I carried it over, but realize we're not pairing the mcp calls to any output...

[gpt-oss] support mcp tool streaming with gptoss

thanks for the review @alecsolder ! I was able to remove the extra checks and simplify the logic. Also updated unit tests and caught an issue with multi-turn mcp requests...

[gpt-oss] support mcp tool streaming with gptoss

will create new PRs breaking the current PR into smaller chunks

track server_load

Thanks for the reviews :) Made another pass taking all the feedback into consideration

track server_load

thanks for the review @youngkent resolved all the comments and added a unit test for the /load route. Also attached the benchmark results showing the latency comparison before and after...

track server_load

@youngkent ran benchmark_serving.py twice with and without the middleware and added to the PR description. Is it safe to assume the discrepancy between the runs is due to random +/-...

track server_load

@youngkent updated the PR description with the benchmark comparison across 20,000 requests per group

track server_load

@youngkent I was previously using the random dataset when benchmarking - thought that may have an effect on the variance. I updated the description with the results after I switched...

track server_load

Thanks for the review @robertgshaw2-redhat I ran the performance test on 2 x H100 GPU

track server_load

@youngkent took a different approach that should be much better performance-wise. Updated the description to include the latest benchmarks