Flex Wang

Results 4 comments of Flex Wang

why remove type annotation would help? Or maybe I should ask, why use pbu.InferenceRequest would cause issue?

Will this ever work? I didn't see `llama` defined under: https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/triton_backend

https://github.com/NVIDIA/FasterTransformer/pull/725