Ryan McCormick
Ryan McCormick
Hi @gulldan, `compose.py` doesn't currently support the TensorRT-LLM backend (DLIS-6397). You should be able to achieve something similar by using `build.py` with: ``` --backend tensorrtllm:r24.04 --backend python:r24.04 --backend onnxruntime:r24.04 ```...
@nvda-mesharma what's the status of this PR? Still wanted?
Hi @riZZZhik, Thanks for raising this request. Similar issue https://github.com/triton-inference-server/server/issues/3765 is looking specifically for JSON or XML support. Are you looking for the same, or are you looking to provide...
Hi @riZZZhik, there are no plans at this time to allow an arbitrary format. We'd likely need to pre-define a set of acceptable fields before allowing users to specify the...
Hi @janpetrov @Elissa0723 , the "disregarding" of temperature should be fixed with this PR: https://github.com/triton-inference-server/tensorrtllm_backend/pull/578. You should see that `temperature` is now correctly passed when using the BLS model: https://github.com/triton-inference-server/tensorrtllm_backend/blob/edf17484f98e64d0ec1d267323d3a478d72decdb/all_models/inflight_batcher_llm/tensorrt_llm_bls/1/lib/triton_decoder.py#L401,...
Hi @aaditya-srivathsan, I've seen some similar [issues](https://github.com/triton-inference-server/tensorrtllm_backend/issues/390#issuecomment-2047864548) reported that were solved by setting `--use_custom_all_reduce disable`. Can you try this to see if it helps?
Hi @conway-abacus, could you try doing everything (both engine building, and starting Triton) in this image: `nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3`? This should help align the versions for building and runtime to TRTLLM v0.9.0.
Please see this new location for details on an OpenAI-compatible Frontend for Triton: https://github.com/triton-inference-server/server/tree/main/python/openai
Hi @ShuaiShao93, thanks for raising this issue! Can you share minimal reproducer models for your BLS and other model that just immediately returns an error for a quick repro? CC...
Hi @surprisedPikachu007, the broken onnx model link was recently fixed here: https://github.com/triton-inference-server/server/pull/7621. Please try the new link.