Liger-Kernel
Liger-Kernel copied to clipboard
Extending Liger-Kernel Optimizations to Encoder Models Like BER
🚀 The feature, motivation and pitch
Hey team,
I’ve been exploring Liger-Kernel’s optimizations for decoder models like GPT, and I’m curious about extending these benefits to encoder models such as BERT.
BERT is the go-to architecture in areas like discrete diffusion models, a promising research area for the next-generation LLM. In AI for biology, bert has exemplified as ESM (Lin et al., Science 2023, https://www.science.org/doi/10.1126/science.ade2574) which enables significant scientific applications like the 2024 novel prize problem protein structure prediction.
Given Liger-Kernel’s success in boosting training throughput and reducing GPU memory usage for decoder models, applying similar optimizations to encoder architectures seems promising. I’m interested in discussing the feasibility of adapting Liger-Kernel’s techniques for encoder models and would appreciate any insights or considerations from the community.
Alternatives
No response
Additional context
No response