LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Training fails with tons of missing imports in llava_trainer.py
I'm trying to train a model but in llava/train/llava_trainer.py file. It has broken imports everywhere.
I follow the installation in the Readme.md
conda create -n llava python=3.10 -y conda activate llava pip install --upgrade pip # Enable PEP 660 support. pip install -e ".[train]"
But when I ran
LLaVA-NeXT/scripts/train/pretrain_siglip.sh
which will called llava/train/train_mem.py -> train.py -> llava_trainer.py
I get error like:
NameError: name 'get_model_param_count' is not defined NameError: name 'is_torch_xla_available' is not defined
NameError: name 'DistributedType' is not defined NameError: name 'DebugOption' is not defined
Which cause from trainer file is missing imports for functions that are actually used in the code. Functions are called but never imported anywhere. I had to manually add these imports just to get past the first few errors:
from transformers.debug_utils import DebugOption, DebugUnderflowOverflow
from transformers.integrations import deepspeed_init
from transformers import TrainerState
from transformers.trainer_pt_utils import get_model_param_count
from transformers.utils import is_torch_xla_available
from accelerate.utils import DistributedType
import math
import time
import numpy as np
import sys
Is this a bug in the code or am I missing something in my installation/setup?
Update: Found the root cause
I found the reason for these import errors. This error is caused by PR #469 which added MeZO support to the LLaVA trainer.
The additional support for MeZO required overriding the _inner_training_loop function of the HuggingFace Trainer (as seen in this commit). However, the imports used in the overridden _inner_training_loop function are not properly imported in llava_trainer.py.
The missing imports from the original HF Trainer need to fix this issue.
@tatarinovst2 Could you add the missing imports from your MeZO implementation? @Luodian This is blocking users from training - might need a quick fix.
Did you encounter another error?
when i ran LLaVA-NeXT/scripts/train/pretrain_siglip.sh,
I got the following errors:
NameError: name 'TrainOutput' is not define,
NameError: name 'plot_graphs_based_on_log_history' is not define,
NameError: name 'speed_metrics' is not define,
The same problem.
Did you encounter another error? when i ran
LLaVA-NeXT/scripts/train/pretrain_siglip.sh, I got the following errors:NameError: name 'TrainOutput' is not define, NameError: name 'plot_graphs_based_on_log_history' is not define, NameError: name 'speed_metrics' is not define,
I think this is probably the same problem. You can solve this issue by cloning the branch before PR https://github.com/LLaVA-VL/LLaVA-NeXT/pull/469
I’m running into the same problem. I opened PR #493 that adds the missing imports and applies minor formatting updates in llava/train/llava_trainer.py. After this change, scripts/train/pretrain_clip.sh runs without import errors on my side.
I’d be grateful for any feedback or suggestions for improvement. Thanks!