LLaVA-NeXT
LLaVA-NeXT copied to clipboard
torch.distributed.elastic.multiprocessing.errors.ChildFailedError Error
trafficstars
Hi All,
I have step up everything with LLaVA-Next repo. and I want to run the pretrain code file for one vision dataset however when I am running the code file it will run but after a certain time the code crash automatically and give me above error as ChildFailedError.
I have add script to use the dataset from hugging face to LLaVa-Next Model : https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data
I am not able to figure it out for now , can anyone help me to fix this issue so that I can go with pretrain < Fine tune < inference. also please check the SS for reference of error I am getting after running.
Please share your thoughts on this.
Thanks.