Erfan Zare Chavoshi comments

Results 26 comments of


                                            Erfan Zare Chavoshi

LLaMA 2 support for pre-training

I have implemented a version of that but I haven't checked that yet I used the same architecture as EasyLM in some parts https://github.com/erfanzar/EasyDeL/blob/main/EasyDel/modules/llama/modelling_llama_flax.py

[Feature]: Team members should have ability to leave a project

is it available now? I mean there's still no way to leave a team I think that something funny is how this kinda basic feature is not supported yet.

Feature request: Use Orbax for checkpointing.

obrax do not support load streaming and sharding data or array across devices with pjit so I think the current checkpointing method that is being used right now is a...

checkpoint's size is increasing everytime.

is it possible to share weights and state with me? so i can debug that and fix issue, anyway that's the first time i see an issue like that i...

checkpoint's size is increasing everytime.

this issue might be fixed do to recent changes and bug fixes in past days in fjformer

Mosaic kernels cannot be automatically partitioned. Please wrap the call in a shard_map or xmap

Hi and thanks for using easydel Actually im creating that and mostly focusing on cpu and gpu so i forgot to test that on tpus ... Ill fix that soon,...

Mosaic kernels cannot be automatically partitioned. Please wrap the call in a shard_map or xmap

i have fixed the issue related to shmap and xmap ..., but some custom kernels are still not supported or have incorrect computations in TPUv3, and pallas flash attention can...

vmem OOM on TPU

use 1,-1,1,1 that's the best sharding case or write custom sharding methods and use FSDP on every layer that's easier

"module 'jax.core' has no attribute 'new_main'" using jax>=0.4.36

Hi @davisyoshida, I hope you're doing well! I wanted to reach out regarding **qax** usage in **fjformer** for quantization workflows within the EasyDel project. After the 0.4.35 release, I noticed...

Gradient Checkpointing causes model to compute junk results (NNX)

Any update or help on this?