Xinhao Li issues

Results 18 issues of


                                            Xinhao Li

[Bug] Error in Evaluation

Hi, when running `llama_train.py` distributedly on a v3-512 tpu pod, when I turn on evaluation (`eval_steps > 0`), I got this error: ``` RuntimeError: Running operations on `Array`s that are...

Llama 7b Pretraining Dtype

Hi, thank you so much for releasing this wonderful code! I notice in your `examples/pretrain_llama_7b.sh`, the `dtype` is set to `fp32`, which seems to make activations `fp32`. However, I think...

ERROR: Accessing retired flag 'jax_enable_async_collective_offload'

Hi, thank you so much for releasing this wonderful codebase. When I'm trying to run pretrain_llama_7b on some v3-tpu pod, I got this error: ``` ERROR: Accessing retired flag 'jax_enable_async_collective_offload'...

Question regarding Shuffling

Hi, thank you very much for releasing this great dataset. I am wondering if the **original PILE dataset** (with 30 chunks) have already shuffled? Or do we still need to...

Wrong train_state.step when resuming from checkpoint for the second time

Hi, thank you for releasing this great codebase. I noticed that if a job is interrupted twice (say first interruption at step 25, then resume and continue until step 45,...

bug

Too small initializer variance

Thank you very much for the update to support llama 3 model! I noticed that `config.initializer_range` is default to 0.02, and `jax.nn.initializers.normal(self.config.initializer_range / np.sqrt(config.hidden_size))` is used for initialization. However, in...

Multi-node training

Hi, thank you so much for releasing this great code base! I noticed that your Laion blog says that the pre-training of OpenLM 1B/7B took place on 128 or 256...

Add eval code for LLaMA 3.2 text model

### 🚀 The feature, motivation and pitch The current llama-recipes codebase only supports eval for llama 3.1 models. It would be very helpful to add eval for llama 3.2 1B-Base/Instruct...