KookHoiKim
KookHoiKim
**Describe the bug** I try to use LLaVA example and faced to key mismatch error. I am on latest commit in main branch. (094d66b) [rank0]: RuntimeError: Error(s) in loading state_dict...
**Describe the bug** I am currently working with llava model in megatron. I tested tensor parallel and it works well. However, when i set pipeline parallel, it stucks while initialization....
**Describe the bug** I followed [llama_mistral.md](https://github.com/NVIDIA/Megatron-LM/blob/main/docs/llama_mistral.md) using mistral 7b model. (also using llama model too) However, it raises error below. ```using world size: 1, data-parallel size: 1, context-parallel size: 1...
In my understanding, in pretrain code, it broadcasts the data from tp rank 0 to the rest tp rank gpus. However, if i activate the option `train_valid_test_datasets_provider.is_distributed = True` while...