Casper
Casper
Multi-node GRPO only works with `ray job submit -- python3 -u -m verl.trainer.main_ppo ...`
Hi @paolovic, at the moment this is not something explicitly supported or even something that I have attempted. I suspect it could be possible, but it's not something that I...
Any update on this? Lots of people are waiting on this to be resolved, so they can upgrade to use the new AutoTP for additional optimization in their training
Hi @BinFuPKU, thanks for raising the issue. I will need to further investigate what causes this, but I can see it will not be easy to debug since the model...
@Kk1984up try upgrading to the newest version
I think you may have an issue with your torch installation. Try to reinstall torch
+++ would love to see MS-AMP supported. Currently, H100s are on par with A100s cost-wise even with the current FP8 implementation, but if MS-AMP FP8 can be implemented, it is...
Shouldn’t the FLOPs increase and thereby reducing training time? It should not be present on small models, but if you take a 30B, I would be surprised if you don’t...
@MuYu-zhi please check out the gemm linear module. All weights are packed in a special way that is related to execution of CUDA kernels.
I made an initial attempt that did not work. https://github.com/casper-hansen/AutoAWQ/compare/main...gemma2. Unfortunately, I do not have enough time at the moment to do further research on how to support the new...