FlagEmbedding 请问BGE-M3中的multi-Granularity中的最大文档长度8192tokens是怎么实现的

Apr 30 '24 03:04 chengzi-big

We pre-train and fine-tune bge-m3 one long texts. You can refer to our paper: https://arxiv.org/abs/2402.03216

Apr 30 '24 14:04 staoxiao

我的意思是基于transformer的模型输入长度最大不超过512tokens，BGE-M3模型是如何将输入扩展到8192个tokens的

May 02 '24 03:05 chengzi-big

@chengzi-big , the transformer architecture does not have a length limit. The limitation of length comes from the positional encoding. We use the absolute positional encoding with a length of 8192.

May 02 '24 14:05 staoxiao

谢谢你的回复，你的答案对我帮助很大。

May 03 '24 02:05 chengzi-big

@chengzi-big , the transformer architecture does not have a length limit. The limitation of length comes from the positional encoding. We use the absolute positional encoding with a length of 8192.

@staoxiao @namespace-Pt 请问是用的什么绝对位置编码？你们没有用RoPE吗。是一起参与训练的绝对位置编码吗，怎么初始化的？能分享些细节吗，我仔细看了bge-m3的paper似乎没有提到这方面。

May 31 '24 08:05 shuiyigt