MoEfication
MoEfication copied to clipboard
Results
3
MoEfication issues
Sort by
recently updated
recently updated
newest added
下载hugging face的t5 base model ,运行adj.py报错 修改权重名字如下后依然报错:
In paper ``For the MoE layers, we set the number of experts N to 32 for MoE-Dropout and SSD. MoE-Dropout linearly increases the number of selected experts K from 6...
hello guys, I am wondering if you guys have any plan on releasing the script of Persimmon-8B sparsify and training? I saw only t5, bert and GPT in this repo.