tensorboy

Results 76 comments of tensorboy

> @tensorboy Please checkout the download instructions [here](https://github.com/haotian-liu/LLaVA#pretraining-dataset), thanks. Thanks for the quick feedback. I'm still confused for the pretraining command: ``` torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \ llava/train/train_mem.py \ --model_name_or_path...

> Hi @tensorboy, `chat.json` in this repo is the correct one to go with. You may also download and see if `len(json.load(...))` is roughly 595K. it's 595375

Thank you, what is your suggestions now?

> I am confused. Do you mean that although you were naming the folder as `LLaVA-Instruct-150K/chat.json`, but it actually comes from this CC3M instead? yes, I think I've put all...

> haotian let me try it now

> Also, can you monitor both the GPU RAM, and CPU RAM usage when you are running the code, are they changing before throwing out the error? Your CPU RAM/GPU...

> lm-sys/FastChat#627 It's exactly same issues with that fastchat. I'm not sure what that @record is and how to use that in your code either..

> here same errors. log: ``` torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \ > llava/train/train_mem.py \ > --model_name_or_path /mnt/bd/data-tns-algo-masp-llm/weights/llama-dl-main/vicuna_13B \ > --data_path /mnt/bd/data-tns-algo-masp-llm/experiment/LLaVA/LLaVA-Instruct-150K/llava_instruct_150k.json \ > --image_folder /mnt/bd/data-tns-algo-masp-llm/experiment/LLaVA/data/coco/train2014 \ > --vision_tower openai/clip-vit-large-patch14 \...

> Hi, a random thought: can this be related to `ulimit`? > > Specifically, what's `ulimit -u`, `ulimit -v`, `ulimit -m` on your machine? > > And what's your OS...

> ```python > from torch.distributed.elastic.multiprocessing.errors import record > > @record > ``` same errors: ``` OGLEVEL=INFO TORCHELASTIC_ENABLE_FILE_TIMER=1 torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 llava/train/train_mem.py --model_name_or_path /mnt/bd/data-tns-algo-masp-llm/weights/llama-dl-main/vicuna_13B --data_path /mnt/bd/data-tns-algo-masp-llm/experiment/LLaVA/LLaVA-Instruct-150K/chat.json --image_folder /mnt/bd/data-tns-algo-masp-llm/experiment/LLaVA/LLaVA-Instruct-150K/images --vision_tower openai/clip-vit-large-patch14...