tianyu-l

Results 33 comments of tianyu-l

@XinDongol Thanks for trying out! The work is still in progress and not ready. Yes, we are aware of the issue you mentioned and we are working on distributed checkpointing...

> I was wondering whether you find the root cause of all ranks receiving the same state_dict of dataloader? I guess that it is because the state_dict is not in...

@XinDongol Thanks for the note! I believe we are aware of the issue (@gokulavasan to double check). The reason we didn't prioritize supporting `num_worker>1` is that the llama training is...

@XinDongol Appreciated your feedback a lot! > I tried a 1B model and found that data loading time is about 10% of end-to-end time when num_workers=1 for torchtitan with on-the-fly...

@lessw2020 will connect with HF to see if they can support weights conversion from HF to pytorch. After that, we may import that in the code or update the tutorial.

@rlrs Thanks, pls feel free to share it here! As far as we know, HF is also working on such a script to convert from HF to DCP. As discussed...

@bkchang From HF [website](https://huggingface.co/docs/transformers/main/en/model_doc/llama3), there's a [script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) to convert llama weights to HF format.

close as we removed the feature in #535

@qiziAI Thanks for pointing this out! Since the newly added import "_copy_state_dict" is not used by default, we don't necessarily need to require the most recent pytorch. This is fixed...

Hi @rlrs, thanks for bringing up the concern! We are using the same definition as in llama3 code https://github.com/meta-llama/llama3/blob/main/llama/model.py#L65 Would you provide more details on how you verified the loaded...