EasyLM icon indicating copy to clipboard operation
EasyLM copied to clipboard

ERROR: Accessing retired flag 'jax_enable_async_collective_offload'

Open LeoXinhaoLee opened this issue 11 months ago • 1 comments

Hi, thank you so much for releasing this wonderful codebase. When I'm trying to run pretrain_llama_7b on some v3-tpu pod, I got this error:

ERROR: Accessing retired flag 'jax_enable_async_collective_offload'

It seems related to the flag specified before launching the job:

export LIBTPU_INIT_ARGS='--xla_jf_spmd_threshold_for_windowed_einsum_mib=0 \
--xla_tpu_spmd_threshold_for_allgather_cse=10000 \
--xla_tpu_spmd_rewrite_einsum_with_reshape=true \
--xla_enable_async_all_gather=true \
--jax_enable_async_collective_offload=true \
--xla_tpu_enable_latency_hiding_scheduler=true TPU_MEGACORE=MEGACORE_DENSE'

I am wondering if these flags are necessary and if some could cause the error? Thank you very much for your time and help!

LeoXinhaoLee avatar Mar 15 '24 07:03 LeoXinhaoLee

Same problem here.

s-smits avatar Jul 14 '24 12:07 s-smits