Ryan McCormick comments

Results 162 comments of


                                            Ryan McCormick

Cant build python+onnx+ternsorrtllm backends r24.04

Hi @gulldan, `compose.py` doesn't currently support the TensorRT-LLM backend (DLIS-6397). You should be able to achieve something similar by using `build.py` with: ``` --backend tensorrtllm:r24.04 --backend python:r24.04 --backend onnxruntime:r24.04 ```...

ci: Add GitHub action to trigger Gitlab CI pipeline

@nvda-mesharma what's the status of this PR? Still wanted?

Allow to pass custom logging format via options

Hi @riZZZhik, Thanks for raising this request. Similar issue https://github.com/triton-inference-server/server/issues/3765 is looking specifically for JSON or XML support. Are you looking for the same, or are you looking to provide...

Allow to pass custom logging format via options

Hi @riZZZhik, there are no plans at this time to allow an arbitrary format. We'd likely need to pre-define a set of acceptable fields before allowing users to specify the...

tensorrt_llm_bls disregards top_k / temperature setting

Hi @janpetrov @Elissa0723 , the "disregarding" of temperature should be fixed with this PR: https://github.com/triton-inference-server/tensorrtllm_backend/pull/578. You should see that `temperature` is now correctly passed when using the BLS model: https://github.com/triton-inference-server/tensorrtllm_backend/blob/edf17484f98e64d0ec1d267323d3a478d72decdb/all_models/inflight_batcher_llm/tensorrt_llm_bls/1/lib/triton_decoder.py#L401,...

Mixtral 8x7-v0.1 Hangs after serving a few requests

Hi @aaditya-srivathsan, I've seen some similar [issues](https://github.com/triton-inference-server/tensorrtllm_backend/issues/390#issuecomment-2047864548) reported that were solved by setting `--use_custom_all_reduce disable`. Can you try this to see if it helps?

Ryan McCormick

Cant build python+onnx+ternsorrtllm backends r24.04

ci: Add GitHub action to trigger Gitlab CI pipeline

Allow to pass custom logging format via options

Allow to pass custom logging format via options

tensorrt_llm_bls disregards top_k / temperature setting

Mixtral 8x7-v0.1 Hangs after serving a few requests

Can't launch triton server following docs, expecting [TensorRT] library version 9.2.0.5 got 9.3.0.1

Openai triton server

InferenceResponse error code is lost in Python BLS

./fetch_models.sh - unable to resolve host address