FasterTransformer
FasterTransformer copied to clipboard
Support for Falcon models
Since Falcon is a multi query attention model, and FT doesn't support multi query attention model's conversion, do we've a support planned for this?
FasterTransformer development has transitioned to TensorRT-LLM.
Falcon is supported in TensorRT-LLM, please refer this example.