joker
joker
在微调时llm_reranker.finetune_for_layerwise CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 \ -m FlagEmbedding.llm_reranker.finetune_for_layerwise.run \ --output_dir ./model_ha \ --model_name_or_path ./bge-reranker-v2-minicpm-layerwise \ --train_data ./data/train_0425.jsonl \ --learning_rate 2e-4 \ --num_train_epochs 50 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 16 \ --dataloader_drop_last...
在进行对bge-m3统一微调(密集嵌入、稀疏嵌入和colbert)的时候,发现训练的代码不是很详细,不太清楚其中的原理 {"query": str, "pos": List[str], "neg":List[str]} 是query+pos,query+neg,进行二分类么
### Do you need to ask a question? - [x] I have searched the existing question and discussions and this question is not already answered. - [x] I believe this...