MoEfication
MoEfication copied to clipboard
scripts for Exploring the Benefit of Activation Sparsity in Pre-training
hello guys, I am wondering if you guys have any plan on releasing the script of Persimmon-8B sparsify and training? I saw only t5, bert and GPT in this repo.