Ryan McCormick

Results 157 comments of Ryan McCormick
trafficstars

Hi @rvroge, thanks for contributing the PR! CC'ing a couple folks who may have some more context @oandreeva-nv @nv-kmcgill53

Hi @chenchunhui97, >> generate onnx for server (with torch version 2.1.2 ) If your `bert` model is an ONNX model, then you should be specifying the `onnxruntime` backend in the...

This seems like a reasonable and relatively simple request to me. @GuanLuo @nnshah1 what do you think?

Hi @chenchunhui97, this example is pretty old, but may work: https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/triton/README.md

Testing PR: https://github.com/triton-inference-server/server/pull/7756

Hi @Vaishnvi, thanks for sharing such detailed info. Since this is an ONNX model, and the [ORT backend supports full config auto-complete](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#auto-generated-model-configuration), can you try to load the model without...

Hi @ShuaiShao93, thanks for raising this issue. To clarify, are these errors causing the inference requests to fail? Or is it just logging errors without affecting inference? CC @krishung5 @indrajit96

Thanks for clarifying! I'm going to move this issue to the TRT-LLM Backend repo for further help.

Hi @jasonngap1, I'm transferring this issue to the TRT-LLM backend: https://github.com/triton-inference-server/tensorrtllm_backend for help. CC @schetlur-nv @pcastonguay

Hi @chorus-over-flanger @faileon, thanks for expressing interest in the `/embeddings` route! It's on our radar as another feature to add to the OpenAI frontend support (chat, completions, and models) added...