FlagEmbedding issues

cannot import name 'Gemma2FlashAttention2' from 'transformers.models.gemma2.modeling_gemma2'

6

Latest version of transformers is required for bug fixes `cannot import name 'is_torch_greater_or_equal_than_2_0' from 'transformers.pytorch_utils'` After upgrading to transformers 4.48.2 we get this import error for FlagEmbedding ``` File /opt/homebrew/lib/python3.11/site-packages/FlagEmbedding/inference/reranker/decoder_only/models/gemma_model.py:56...

mxchinegod

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x

1

Using code from 1.2.x and 1.3.x, up to 100% performance regression occurs during inference. The performance degrades in subsequent calls to `model.encode`; `M3Embedder.encode_single_device` is 2x slower than the original 1.2.x...

ivlcic

Is there a plan to support gemma3 1B and 4B embedding/reranker?

1

Gemma3 is a impressive work and since FlagEmbedding already support Gemma embedding and Reranker. Is there a plan to train new Embedding and Reranking model based on Gemma3? Thanks!

xiaofan-luan

Core Dumped

2

如图： ![core-dumped](https://github.com/user-attachments/assets/b27d1327-7410-4768-9207-de7a05de9c0a)

zipzou

Clarification on train_group_size and GPU Utilization for Negative Samples in Latest Version

3

I am currently attempting to fine-tune bge-m3 and have been referring to the following documentation: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/embedder#2-bge-m3 1. The default value for `train_group_size` was previously set to 2 but has now...

zhongxifang

想知道tokenizer_config.json里的tokenize_chinese_chars默认为true是出于什么考量吗

我在使用 bge-small-zh-v1.5 时候发现这个参数默认为true，会导致对输入句子 pre-tokenize 时将所有中文字符前后都加上空格再做下一步处理，然后导致vocab里面相当大部分如 ##你 ##好这样的token完全用不上。我为了用上这些token，微调时将这个参数设为false，导致效果有明显下降。所以我想知道模型在预训练时也是默认这个参数为true吗，感谢回答

wiwuwiwu

微调时报错runtime error

2

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please...

ariessweety07

The embedding values returned by calling the BGEM3FlagModel.encode() method are different

1

1、code from FlagEmbedding import BGEM3FlagModel model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=True) print(model.encode(['萌龙大乱斗'])['dense_vecs']) print(model.encode(['萌龙大乱斗', '《萌龙大乱斗》是一款由Gameloft开发的以龙为主题的模拟经营游戏。在游戏中，玩家可以按照自己的心意育成各具个性的龙，并与它们一起踏上冒险之旅。养育龙的过程中，你不仅需要帮助它们建造住所、喂食以提升等级，还要通过不断训练来强化技能，当然，让不同属性的龙相互交配产出具有特殊性质的个体也是其中的一大乐趣。在包罗万象的游戏中繁育靓龙参加战斗，开启精彩的冒险之旅！与栩栩如生的萌龙互动嬉戏，在龙之岛修建栖息地，在妙趣横生的小游戏中喂养幼龙，带着它们去探索广袤世界！还想更过瘾？那就去3对3的战斗中挑战好友或维京人吧！让最强大的成年龙杂交来获取新的龙种以完成收藏，甚至还能获得神秘的传奇龙哟！★游戏特色★- 萌龙们变得前所未有的炫酷啦！令人叹为观止的视觉效果让缤纷多彩的岛屿和龙之岛居民们栩栩如生！- 萌宠们离不开你的悉心照料！给它们喂食，抚摸并关爱它们，获得额外的金币和特殊奖励。 - 超过350块拼图助你打造豪华靓龙收藏，让你的可爱萌龙朋友们成群结队而来！- 带领你的龙战队称霸岛屿！多多参与战斗，扩大萌龙收藏并提升技能，向更高级的联赛奋力冲刺！- 每次更新都会为你带来新的季赛和任务，保证让你乐此不疲！- 争当战场枭雄！去竞技场挑战对手，获得超凡大奖！- 结交好友，访问它们的岛屿并交换礼物！- 创建运筹帷幄的部落！利用部落聊天功能，大家群策群力，制定最佳作战策略，或者讨论各自的计划打算！', '合成', '模拟经营', 'a', 'bb'])['dense_vecs']) 2、result first text "萌龙大乱斗" embdedding value...

hemintang

进程退出 cleanup 时抛出异常

9

环境: * Mac silicon * Python 3.12.8 * FlagEmbedding==1.3.3 ``` :Exception ignored in: Traceback (most recent call last): File "/xxx/.venv/lib/python3.12/site-packages/FlagEmbedding/abc/inference/AbsEmbedder.py", line 270, in __del__ File "/xxx/.venv/lib/python3.12/site-packages/FlagEmbedding/abc/inference/AbsEmbedder.py", line 89, in stop_self_pool...

patricksuo

I'm trying HNM via hn_mine.py, but the hard negatives are gibberish.

2

Hi, I'm trying to do HNM via hn_mine.py. The dataset exists as below: ```python # sample.jsonl (120k rows) { "query": "사채권자가 자본금 감소에 대하여 이의를 제기하려면 사채권자집회의 결의가 있어야 하나,...

seongjiko

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

cannot import name 'Gemma2FlashAttention2' from 'transformers.models.gemma2.modeling_gemma2'

Performance for BGE-M3 inference dropped between 1.2.x and 1.3.x

Is there a plan to support gemma3 1B and 4B embedding/reranker?

Core Dumped

Clarification on train_group_size and GPU Utilization for Negative Samples in Latest Version

想知道tokenizer_config.json里的tokenize_chinese_chars默认为true是出于什么考量吗

微调时报错runtime error

The embedding values returned by calling the BGEM3FlagModel.encode() method are different

进程退出 cleanup 时抛出异常

I'm trying HNM via hn_mine.py, but the hard negatives are gibberish.

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard