tianyu-l comments

Results 33 comments of


                                            tianyu-l

Make dataloader stateful?

@XinDongol Thanks for trying out! The work is still in progress and not ready. Yes, we are aware of the issue you mentioned and we are working on distributed checkpointing...

Make dataloader stateful?

> I was wondering whether you find the root cause of all ranks receiving the same state_dict of dataloader? I guess that it is because the state_dict is not in...

Make dataloader stateful?

@XinDongol Thanks for the note! I believe we are aware of the issue (@gokulavasan to double check). The reason we didn't prioritize supporting `num_worker>1` is that the llama training is...

Make dataloader stateful?

@XinDongol Appreciated your feedback a lot! > I tried a 1B model and found that data loading time is about 10% of end-to-end time when num_workers=1 for torchtitan with on-the-fly...

reload existing llama checkpoints

@lessw2020 will connect with HF to see if they can support weights conversion from HF to pytorch. After that, we may import that in the code or update the tutorial.

reload existing llama checkpoints

@rlrs Thanks, pls feel free to share it here! As far as we know, HF is also working on such a script to convert from HF to DCP. As discussed...

reload existing llama checkpoints

@bkchang From HF [website](https://huggingface.co/docs/transformers/main/en/model_doc/llama3), there's a [script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) to convert llama weights to HF format.

tianyu-l

Make dataloader stateful?

Make dataloader stateful?

Make dataloader stateful?

Make dataloader stateful?

reload existing llama checkpoints

reload existing llama checkpoints

reload existing llama checkpoints

selective compilation - norm layers only

Update requirements.txt

RoPE implementation differences