FasterTransformer
FasterTransformer copied to clipboard
LLaMA support
Given existing support for GPT-J and its rotary embeddings, is LLaMA supported as well? Huggingface just shipped their implementation: https://github.com/huggingface/transformers/commit/464d420775653885760e30d24d3703e14f4e8a14
@byshiue