Seung Ho Jang

Results 3 comments of Seung Ho Jang

Hello, I wonder if #584 also applies to GPT-J? I am testing inferences with tritonserver's fastertransformer backend and GPT-J converted model, and it takes as much as time in proportion...

@devin12422 No, since FasterTransformer is deprecated and TensorRT-LLM succeeded it, just used tensorrtllm_backend and it seemed to work fine.