FasterTransformer Support for Falcon models

Support for Falcon models

Open ankit201 opened this issue 1 year ago • 1 comments

Since Falcon is a multi query attention model, and FT doesn't support multi query attention model's conversion, do we've a support planned for this?

Jun 14 '23 10:06 ankit201

FasterTransformer development has transitioned to TensorRT-LLM.

Falcon is supported in TensorRT-LLM, please refer this example.

Oct 20 '23 07:10 byshiue