Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Extending Liger-Kernel Optimizations to Encoder Models Like BER

Open pengzhangzhi opened this issue 10 months ago • 0 comments
trafficstars

🚀 The feature, motivation and pitch

Hey team,

I’ve been exploring Liger-Kernel’s optimizations for decoder models like GPT, and I’m curious about extending these benefits to encoder models such as BERT.

BERT is the go-to architecture in areas like discrete diffusion models, a promising research area for the next-generation LLM. In AI for biology, bert has exemplified as ESM (Lin et al., Science 2023, https://www.science.org/doi/10.1126/science.ade2574) which enables significant scientific applications like the 2024 novel prize problem protein structure prediction.

Given Liger-Kernel’s success in boosting training throughput and reducing GPU memory usage for decoder models, applying similar optimizations to encoder architectures seems promising. I’m interested in discussing the feasibility of adapting Liger-Kernel’s techniques for encoder models and would appreciate any insights or considerations from the community.

Alternatives

No response

Additional context

No response

pengzhangzhi avatar Dec 26 '24 15:12 pengzhangzhi