Flex Wang
Results
4
comments of
Flex Wang
why remove type annotation would help? Or maybe I should ask, why use pbu.InferenceRequest would cause issue?
Will this ever work? I didn't see `llama` defined under: https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/triton_backend
https://github.com/NVIDIA/FasterTransformer/pull/725
@byshiue @dwyatte mind taking a look?