Erfan Zare Chavoshi
Erfan Zare Chavoshi
Thanks @demon2036 for sharing this but we are using FA3 kernels for gpus and Splash for TPUs so attention dtype isn't really the issue But thanks ill double check
Thanks @RohitRathore1, this is awesome! Sure, I'll soon integrate this into easydel it should take about 1 or 2 days I'll let u know when it's done.
Thank @theonlyfoxy for the recommendation. This will be added soon.
LoRA + GRPO is now supported just call apply lora on your model and configure that
Hi, you can not use the flash attention mechanism with sequence sharding strategies and it will crash make sure that you are using FSDP sharding instead of SP > would...
do this at the start of importing ```python import os os.environ["EASYDEL_AUTO"]="false" import jax jax.print_environment_info() ``` and check if it fix that
sure, im working on that.
Yes actually that's fixed but there are still some other issues from new experimental features... they all will be fixed soon but in case that your are not in discord...
Your partition specs are wrong Replace weight with kernel and dot separator with /
actually they are fixed right now