FlagEmbedding
FlagEmbedding copied to clipboard
how to get embedding with finetuned model[encoder-only]
I finetuned 'BAAI/bge-m3' with the script
nohup torchrun --nproc_per_node 8 \
--master_port 29505 \
-m FlagEmbedding.finetune.embedder.encoder_only.m3 \
--model_name_or_path ../BAAI/bge-m3 \
--cache_dir ../cache/model \
--train_data ../general_train_data/mini-nq-like-general-train \
--cache_path ../cache/data \
--train_group_size 8 \
--query_max_len 512 \
--passage_max_len 512 \
--pad_to_multiple_of 8 \
--knowledge_distillation False \
--same_dataset_within_batch True \
--small_threshold 0 \
--drop_threshold 0 \
--output_dir ../test_encoder_only_m3_bge-m3_sd \
--overwrite_output_dir \
--learning_rate 1e-5 \
--fp16 \
--num_train_epochs 2 \
--per_device_train_batch_size 2 \
--dataloader_drop_last True \
--warmup_ratio 0.1 \
--gradient_checkpointing \
--deepspeed ds_stage0.json \
--logging_steps 1 \
--save_steps 5000 \
--negatives_cross_device \
--temperature 0.02 \
--sentence_pooling_method cls \
--normalize_embeddings True \
--kd_loss_type m3_kd_loss \
--unified_finetuning True \
--use_self_distill True \
--fix_encoder False \
--self_distill_start_step 0 > finetune.log 2>&1 &
Then I got the saved model in checkpoint-20000:
ls -lrt
total 1.1G
-rw-r--r-- 1 root root 701 Jan 17 19:03 config.json
-rw-r--r-- 1 root root 1.1G Jan 17 19:04 model.safetensors
-rw-r--r-- 1 root root 1.2K Jan 17 19:04 tokenizer_config.json
-rw-r--r-- 1 root root 964 Jan 17 19:04 special_tokens_map.json
-rw-r--r-- 1 root root 3.0K Jan 17 19:04 sparse_linear.pt
-rw-r--r-- 1 root root 4.9M Jan 17 19:04 sentencepiece.bpe.model
-rw-r--r-- 1 root root 2.1M Jan 17 19:04 colbert_linear.pt
-rw-r--r-- 1 root root 7.0K Jan 17 19:04 training_args.bin
-rw-r--r-- 1 root root 17M Jan 17 19:04 tokenizer.json
drwxrwxrwx 3 root root 4.0K Jan 17 19:04 global_step20000/
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_5.pth
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_0.pth
-rw-r--r-- 1 root root 16 Jan 17 19:04 latest
-rw-r--r-- 1 root root 3.4M Jan 17 19:04 trainer_state.json
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_7.pth
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_6.pth
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_4.pth
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_3.pth
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_2.pth
-rw-r--r-- 1 root root 22K Jan 17 19:04 rng_state_1.pth
The model looks totally different from the 'BAAI/bge-m3', I loaded it and got many errors. I tried to use save_ckpt_for_sentence_transformers method, but got the same errors.
Traceback (most recent call last):
File "/root/paddlejob/workspace/env_run/liuli/FlagEmbedding/to_sentence_transformer_model.py", line 19, in <module>
save_ckpt_for_sentence_transformers(ckpt_dir, pooling_mode='cls', normlized=True)
File "/root/paddlejob/workspace/env_run/liuli/FlagEmbedding/to_sentence_transformer_model.py", line 6, in save_ckpt_for_sentence_transformers
word_embedding_model = models.Transformer(ckpt_dir)
File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 78, in __init__
self._load_model(model_name_or_path, config, cache_dir, backend, **model_args)
File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 138, in _load_model
self.auto_model = AutoModel.from_pretrained(
File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3735, in from_pretrained
with safe_open(resolved_archive_file, framework="pt") as f:
OSError: No such device (os error 19)
I have no idea to inference with the finetuned model. Can you help me?
后面解决这个问题了,是因为之前训练的时候由于机器的磁盘空间不足,所以模型保存载挂载的afs目录下,读取模型的时候不支持直接afs直接读取,从afs目录拷贝到磁盘上就行了;
For anyone who meets this error, copy the model into your local disk, this problem will be fixed.