Liger-Kernel
Liger-Kernel copied to clipboard
[feat] Add support for encoder-only transformers (e.g. BERT)
🚀 The feature, motivation and pitch
Liger Kernel is currently incompatible with encoder-only transformer architectures such as BERT, DistilBERT, RoBERTa, XLM-R, and DeBERTa.
Given the importance these models still have in research and industry use-cases, it would be great to see support added to further decrease memory requirements and increase training throughput.
Alternatives
No response
Additional context
No response