Funtowicz Morgan
Funtowicz Morgan
Tentative CLI interface: `optimum benchmark push -m TIME_TO_FIRST_TOKEN -k latency -v 123.4 -t --meta commit_id=saflsfkja3115 --meta model_id=meta-llama/Meta-Llama-3.1-8B-Instruct --meta dtype=float16 [...] es+aws://benchmarks-k[...]deny.us-east-1.es.amazonaws.com`
Hi @michaelthreet - thanks for your interest in the TRTLLM backend. The overall backend is pretty new and might suffer from edge cases not being handled but it should be...
Awesome to hear it build successfully and cool you were able to figure out the required adaptations 😍. Effectively, TensorRT-LLM engines are necessary not compatible from one release to another...
Argh, interesting... I'm developping with the same model and haven't got this output Anyway going to dig tomorrow morning and will report here, sorry for the inconvenience @michaelthreet
Sorry for the delay @michaelthreet, I've got sidetracked by something else. Going to take a look tomorrow, thanks a ton for the additional inputs. Reporting here shortly 🤗
`healthRoute` is available at the root of the endpoint description payload
Maybe we should split the two scopes that we are addressing here: - Focus this PR on making GHA argument secrets and not ENV - Open a second PR to...
I've put `~/.mcp` to have something which can be easily implemented without branching on plateform. @julien-c if we are fine with the branching, I totally think we can do something...