FlagEmbedding issues

How can I adjust only specific layers and not all reranker and bi encoder layers in your project (e.g. adjust only the classifier layer)

1

您好，我注意到https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain中提供了预训练示例，这里的预训练是从头开始重新预训练一个模型，不知我这样理解是否正确？假设我理解正确的话，请问如果要针对某特殊文本（比如：专利标题文本等）这种无监督数据，对现有的预训练模型、进行基于RetroMAE算法的二次训练，这该如何实现呢？数据格式与您提供的格式一致 {"text": "一种用于溶胶法SERS检测的微流控芯片及其使用方法"} {"text": "一种钢筋防腐用韧性涂料及其涂覆方法"}

LLLiHaotian

all_gather timeout

2

1%|▏ | 4817/360910 [38:19

trillionmonster

The max_position_embedding is only 512 and 514 in config.py of "beg-rerangker-large" and "beg-large-zh-v1.5"， which result the "max_input_token" cannot support over 512/514

1

The max_position_embedding is only 512 and 514 in config.py of "beg-rerangker-large" and "beg-large-zh-v1.5"， which result the "max_input_token" cannot support over 512/514 so how can i solve the problem?

xwqianbei

protocol_prefix = fs.protocol + "://" if fs.protocol != "file" else ""

1

执行微调命令：torchrun --nproc_per_node 1 -m FlagEmbedding.reranker.run --output_dir ./FlagEmbedding/reranker/rerank_output --model_name_or_path ./model/BAAIbge-reranker-base --train_data ./examples/reranker/toy_finetune_data.jsonl --learning_rate 6e-5 --fp16 --num_train_epochs 2 --per_device_train_batch_size 1 --gradient_accumulation_steps 4 --dataloader_drop_last True --train_group_size 16 --max_len 512 --weight_decay 0.01 --logging_steps 10...

sunyue-s

bge-reranker-v2-m3 环境配置咨询

1

本机环境：python=3.8，pytorch=1.11.0，nvidia-smi=450.119.04，cuda version=11.0，nvcc -V=11.3.109 执行pip install FlagEmbedding成功 python控制台执行 from FlagEmbedding import FlagReranker reranker = FlagReranker('/root/workspace/bge-reranker-v2-m3', use_fp16=True) 会大约1分钟没响应，然后显示“killed”并退出了python控制台请问下是因为配置环境导致这样问题，或者是其他？或者在哪里可以看到具体报错日志？

ericalduo

关于向量的可解释性问题

2

您好，非常棒的工作，我一直在使用bge系列模型，但目前有一个疑问，就是是否可以得知原文本中哪些关键词片段对于最终embedding表征的贡献是最大的？是否可以引入关键词权重的信息，人工的去控制感兴趣部分的关键词片段在生成embedding向量时具有更高的权重呢？请问咱们是否有过这方面的研究或者好的参考建议，谢谢！

Gladiator566

Add 'encoder-decoder' support

4

I have been LM-Cocktail for merging Language models, specifically the 'mix_models_with_data' function. However, I noticed there are only implementations for encoder or decoder models, not encoder-decoder. Maybe it'd be nice...

RKoopal

Finetuning

1

How can i finetune in kaggle or colab

thanhtruongtran

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

How can I adjust only specific layers and not all reranker and bi encoder layers in your project (e.g. adjust only the classifier layer)

C-MTEB如何添加自己的数据集进行测试

关于RetroMAE预训练问题

all_gather timeout

The max_position_embedding is only 512 and 514 in config.py of "beg-rerangker-large" and "beg-large-zh-v1.5"， which result the "max_input_token" cannot support over 512/514

protocol_prefix = fs.protocol + "://" if fs.protocol != "file" else ""

bge-reranker-v2-m3 环境配置咨询

关于向量的可解释性问题

Add 'encoder-decoder' support

Finetuning

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard