Seung Ho Jang
Results
3
comments of
Seung Ho Jang
Hello, I wonder if #584 also applies to GPT-J? I am testing inferences with tritonserver's fastertransformer backend and GPT-J converted model, and it takes as much as time in proportion...
@devin12422 No, since FasterTransformer is deprecated and TensorRT-LLM succeeded it, just used tensorrtllm_backend and it seemed to work fine.
@CharlieFRuan That's great, thanks.