daniel-salib

Results 15 comments of daniel-salib

@bbrowning thanks for catching that - yeah I saw function calls were using it so I carried it over, but realize we're not pairing the mcp calls to any output...

thanks for the review @alecsolder ! I was able to remove the extra checks and simplify the logic. Also updated unit tests and caught an issue with multi-turn mcp requests...

will create new PRs breaking the current PR into smaller chunks

Thanks for the reviews :) Made another pass taking all the feedback into consideration

thanks for the review @youngkent resolved all the comments and added a unit test for the /load route. Also attached the benchmark results showing the latency comparison before and after...

@youngkent ran benchmark_serving.py twice with and without the middleware and added to the PR description. Is it safe to assume the discrepancy between the runs is due to random +/-...

@youngkent updated the PR description with the benchmark comparison across 20,000 requests per group

@youngkent I was previously using the random dataset when benchmarking - thought that may have an effect on the variance. I updated the description with the results after I switched...

Thanks for the review @robertgshaw2-redhat I ran the performance test on 2 x H100 GPU

@youngkent took a different approach that should be much better performance-wise. Updated the description to include the latest benchmarks