TinyLLaVA_Factory icon indicating copy to clipboard operation
TinyLLaVA_Factory copied to clipboard

Error when using the OpenELM as the LLM backbone

Open Zeqing-Wang opened this issue 1 year ago • 4 comments

Hey, TinyLLaVA Factory is great work, and I sincerely thank you for ur sharing. However, I meet some problems when I use OpenELM as the backbone.(TinyLLaMA as the backbone works fun in my environment)

The problem is as "do you wish to run the custom code y/n" and "trust_remote_code=true" e8322a182413695b92dd107cc09efa10

However, I see it has been set here: configuration_tiny_llava.py line:101 image

I have no idea about why it happens, so I start this issue. Thanks in advance!

Zeqing-Wang avatar May 23 '24 15:05 Zeqing-Wang

Thank you for your question,Can I take a look at which script you were running when this problem occurred?

jiajunlong avatar May 24 '24 04:05 jiajunlong

Thank you for your question,Can I take a look at which script you were running when this problem occurred?

Thanks for your quick reply! The script is "train_tinyllama.sh" Here is the detail:

DATA_PATH=/tiny_llava_datasets/text_files/blip_laion_cc_sbu_558k.json #pretrain annotation file path FINETUNE_DATA_PATH=/tiny_llava_datasets/text_files/llava_v1_5_mix665k.json #finetune annotation file path IMAGE_PATH=/tiny_llava_datasets/llava/llava_pretrain/images #pretrain image dir FINETUNE_IMAGE_PATH=/data/tiny_llava_datasets #finetune image dir

LLM_VERSION=/model/OpenELM-450M-Instruct # llm path in huggingface VT_VERSION=/model/clip-vit-large-patch14-336 #vision tower path in huggingface VT_VERSION2="" #if you are not using mof vision tower, keep it empty CN_VERSION=mlp2x_gelu #connector type, other options are: qformer, resampler, etc CONV_VERSION=llama #chat template, other options are: phi, llama, gemmma, etc VERSION=base #experiment name for recording different runnings TRAIN_RECIPE=common #training recipes, other options are: lora, qlora MODEL_MAX_LENGTH=2048 #max model length for llm

bash scripts/train/pretrain.sh "$DATA_PATH" "$IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH" bash scripts/train/finetune.sh "$FINETUNE_DATA_PATH" "$FINETUNE_IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$CONV_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH"

and the raised error is: image

My OpenELM-450M folder contains(I wonder will the python files matter?): image

And my transformers version is: 4.39.3

I did not change any other things apart from the "--attn_implementation eager" in pretrain.sh due to my device is V100, which does not support the flashattention.

Sincerely thanks!

Zeqing-Wang avatar May 24 '24 04:05 Zeqing-Wang

Hi, we have provided a specific script for training OpenELM. Please try that one: scripts/train/openelm/train_openelm.sh.

p.s. the pretraining and finetuning scripts for OpenELM are a bit different from scripts for tinyllama. So please try scripts/train/openelm/train_openelm.sh.

YingHuTsing avatar May 24 '24 07:05 YingHuTsing

Hello, the tokenizer used by Openelm is the tokenizer from Llama, so you need to specify the tokenizer type in the script. However, the script for Tinyllama does not specify this, which leads to the error you mentioned. You can refer to scripts/train/openelm/pretrain_openelm.sh or scripts/train/openelm/finetune_openelm.sh for the specific method. We recommend that you directly use the scripts/train/openelm/train_openelm.sh script, which already has the basic experimental settings specified for you. You can use it directly.

jiajunlong avatar May 24 '24 08:05 jiajunlong

Hello, the tokenizer used by Openelm is the tokenizer from Llama, so you need to specify the tokenizer type in the script. However, the script for Tinyllama does not specify this, which leads to the error you mentioned. You can refer to scripts/train/openelm/pretrain_openelm.sh or scripts/train/openelm/finetune_openelm.sh for the specific method. We recommend that you directly use the scripts/train/openelm/train_openelm.sh script, which already has the basic experimental settings specified for you. You can use it directly.

Hi, we have provided a specific script for training OpenELM. Please try that one: scripts/train/openelm/train_openelm.sh.

p.s. the pretraining and finetuning scripts for OpenELM are a bit different from scripts for tinyllama. So please try scripts/train/openelm/train_openelm.sh.

Sorry for the late reply. I try the train_openelm.sh and it works well. Thanks for your help! I will close this issue as it has been solved.

Zeqing-Wang avatar May 26 '24 07:05 Zeqing-Wang