Erfan Zare Chavoshi comments

Results 26 comments of


                                            Erfan Zare Chavoshi

Gradient Checkpointing causes model to compute junk results (NNX)

Thanks @demon2036 for sharing this but we are using FA3 kernels for gpus and Splash for TPUs so attention dtype isn't really the issue But thanks ill double check

CUDA-based RMSNorm for Performance Optimization

Thanks @RohitRathore1, this is awesome! Sure, I'll soon integrate this into easydel it should take about 1 or 2 days I'll let u know when it's done.

QLoRA Support For GRPOTraining

Thank @theonlyfoxy for the recommendation. This will be added soon.

QLoRA Support For GRPOTraining

LoRA + GRPO is now supported just call apply lora on your model and configure that

TPU v4-32 set-up not working

Hi, you can not use the flash attention mechanism with sequence sharding strategies and it will crash make sure that you are using FSDP sharding instead of SP > would...

TPU v4-32 set-up not working

do this at the start of importing ```python import os os.environ["EASYDEL_AUTO"]="false" import jax jax.print_environment_info() ``` and check if it fix that

TPU v4-32 set-up not working

sure, im working on that.

TPU v4-32 set-up not working

Yes actually that's fixed but there are still some other issues from new experimental features... they all will be fixed soon but in case that your are not in discord...

TPU v4-32 set-up not working

Your partition specs are wrong Replace weight with kernel and dot separator with /

TPU v4-32 set-up not working

actually they are fixed right now