SnippetZero

Results 7 comments of SnippetZero

使用alpaca-lora finetune 测试了load_in_8bit=True 和False的两种情况,结果load_in_8bit 为 True的情况下,训练速度相比为False 慢了一倍多? 这可能是什么原因了?

> In fact, we have already implemented the Medusa TreeMask version in LMDeploy. **When batch=1, the acceleration ratio and RPS improvement relative to the main branch are consistent with those...

> I will split the internal implementation of the TreeMask version into multiple PRs and then submit them. Thank you, could you share the methods to solve the performance degradation...

> EAGLE has a higher computational load than Medusa, but it has a higher acceptance rate. It performs better in large batches compared to Medusa. However, this is just a...

@lzhangzz Hi,Is there a plan to implement FA3 on the turbomind engine? Thanks!