SnippetZero
SnippetZero
使用alpaca-lora finetune 测试了load_in_8bit=True 和False的两种情况,结果load_in_8bit 为 True的情况下,训练速度相比为False 慢了一倍多? 这可能是什么原因了?
@zheyishine Hi, Has there been any progress?
> In fact, we have already implemented the Medusa TreeMask version in LMDeploy. **When batch=1, the acceleration ratio and RPS improvement relative to the main branch are consistent with those...
> I will split the internal implementation of the TreeMask version into multiple PRs and then submit them. Thank you, could you share the methods to solve the performance degradation...
> EAGLE has a higher computational load than Medusa, but it has a higher acceptance rate. It performs better in large batches compared to Medusa. However, this is just a...
https://github.com/sgl-project/sglang/pull/6151
@lzhangzz Hi,Is there a plan to implement FA3 on the turbomind engine? Thanks!