jayakommuru
jayakommuru
Hi @byshiue are sequence classification with T5 models not supported yet?
@taozhang9527 @byshiue I am running `make -C docker release_run LOCAL_USER=1` but still facing this error: ```pull access denied for tensorrt_llm/release, repository does not exist or may require 'docker login': denied:...
@LoverLost were you able to figure this out?
@gabriel-peracio @hcnhcn012 @MrD005 were you able to find a fix for this ?
@byshiue @schetlur-nv can you help with this? Not able to deploy the basic t5-small model, following the instructions given in https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/encoder_decoder.md
@oandreeva-nv can you help with this ^^ ?
@oandreeva-nv Ok, Can there be any throughput/performance benefits by running FP8 TRT engine file with FP16 I/O? which triton data type should be used with FP8 TRT engine file in...
@oandreeva-nv can you confirm if using FP16 I/O triton datatypes and FP8 TRT engine, does it give any benefit? Thanks
@oandreeva-nv Sure, will explore the perf-analyzer. Any idea whether to use FP32 or FP16 I/O datatype of triton for TensorRT FP8 models ?