LLLiHaotian issues

Results 9 issues of


                                            LLLiHaotian

您好，如果是想得到一个中英双语的向量模型该如何去处理呢？是否直接用中英双语的预料微调reranker就能直接得到呢？

是的，两个训练数据的格式是一样的，可以直接使用同一份训练数据。建议的流程是，训练完向量模型后，用向量模型挖掘出难负样本，再去训练reranker，让reranker能更好区分向量返回的top-k结果。 _Originally posted by @staoxiao in https://github.com/FlagOpen/FlagEmbedding/issues/157#issuecomment-1754223604_

[Bug]: MilvusException: <MilvusException: (code=2, message=Fail connecting to server on localhost:19530. Timeout)>

### Is there an existing issue for this? - [X] I have searched the existing issues ### Describe the bug from pymilvus import connections connections.connect("default", host="localhost", port="19530") When I pass...

kind/bug

BAAI/bge-m3-unsupervised 微调过程中难负例挖掘（hn_mine.py）的疑惑

您好，我在对BAAI/bge-m3-unsupervised进行微调过程中参照您提供的toy_finetune_data.jsonl数据格式 {"query": "Five women walk along a beach wearing flip-flops.", "pos": ["Some women with flip-flops on, are walking along the beach"], "neg": ["The 4 women are sitting on the beach.",...

关于使用其他预训练模型

请问，在用bert-base-case、chinese-bert-wwm-ext、chinese-roberta-wwm-ext、chinese-roberta-wwm-ext-large这几个预训练模型跑多标签分类实验的时候都没问题，为什么使用roberta-xlarge-wwm-chinese-cluecorpussmall这个预训练模型跑多标签分类实验，在训练过程中一直 accuracy：0.0000 micro_f1：0.0000 macro_f1：0.0000 为什么会出现这种现象？求解答

BGE-M3的预训练问题——loss产生偶尔上升的情况

请问这种loss产生偶尔上升的情况是否正常，又该如何判断预训练合适结束？ [bge-m3-patent-retromae_batch56_max350.log](https://github.com/FlagOpen/FlagEmbedding/files/15360497/bge-m3-patent-retromae_batch56_max350.log)

BGE-M3预训练问题

如果要基于自己的中英文文本数据，对BGE-M3模型进行RetroMAE的二次训练，是直接对xlm-roberta进行二次训练吗

预训练问题

如果想基于RetroMAE预训练bart、t5系列的模型，应该如何解决呢？ [bart-base-chinese-cluecorpussmall-retromae_batch256_max350.log](https://github.com/FlagOpen/FlagEmbedding/files/15302049/bart-base-chinese-cluecorpussmall-retromae_batch256_max350.log)

关于RetroMAE预训练问题

您好，我注意到https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain中提供了预训练示例，这里的预训练是从头开始重新预训练一个模型，不知我这样理解是否正确？假设我理解正确的话，请问如果要针对某特殊文本（比如：专利标题文本等）这种无监督数据，对现有的预训练模型、进行基于RetroMAE算法的二次训练，这该如何实现呢？数据格式与您提供的格式一致 {"text": "一种用于溶胶法SERS检测的微流控芯片及其使用方法"} {"text": "一种钢筋防腐用韧性涂料及其涂覆方法"}

有监督微调训练报错问题

麻烦您帮我看一下，为什么在微调bge-m3的时候会出现如此报错。之前在没有query数量为1、pos数量为1、neg数量为10的时候为微调训练正常进行；目前调整为query数量为1、pos数量为11、neg数量为10，却有报错信息，我查看了train_data，发现没有什么问题。微调训练命令如下 nohup \ torchrun --nproc_per_node 2 \ -m FlagEmbedding.baai_general_embedding.finetune.run \ --output_dir /bgem3/supervised_simcse_fine-tune \ --model_name_or_path /bgem3 \ --train_data query_pos_neg_data.jsonl \ --learning_rate 1e-5 \ --fp16 \ --num_train_epochs 200 \ --per_device_train_batch_size...