KAG icon indicating copy to clipboard operation
KAG copied to clipboard

[Bug] [BAAI/bge-m3] 构建时嵌入报错

Open Bboyjie opened this issue 11 months ago • 3 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Operating system information

Windows

What happened

异常详细信息: openai.APIStatusError: Error code: 413 - {'code': 20042, 'message': 'input batch size 83 > maximum allowed batch size 64', 'data': None} 嵌入配置如下: vectorizer = kag.common.vectorizer.OpenAIVectorizer model = BAAI/bge-m3 api_key = ** base_url = https://api.siliconflow.cn/v1 vector_dimensions = 1024

How to reproduce

KAG 0.5.1

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Bboyjie avatar Jan 07 '25 12:01 Bboyjie

Search before asking

  • [x] I had searched in the issues and found no similar issues.

Operating system information

Windows

What happened

异常详细信息: openai.APIStatusError: Error code: 413 - {'code': 20042, 'message': 'input batch size 83 > maximum allowed batch size 64', 'data': None} 嵌入配置如下: vectorizer = kag.common.vectorizer.OpenAIVectorizer model = BAAI/bge-m3 api_key = ** base_url = https://api.siliconflow.cn/v1 vector_dimensions = 1024

How to reproduce

KAG 0.5.1

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

It looks like too many entities & relations extracted from one Chunk, and the properties need to be vectorized oversize maximum allowed batch size by siliconflow.

you can try reduce the length of splitter size.

caszkgui avatar Jan 07 '25 12:01 caszkgui

Search before asking

  • [x] I had searched in the issues and found no similar issues.

Operating system information

Windows

What happened

异常详细信息: openai.APIStatusError: Error code: 413 - {'code': 20042, 'message': 'input batch size 83 > maximum allowed batch size 64', 'data': None} 嵌入配置如下: vectorizer = kag.common.vectorizer.OpenAIVectorizer model = BAAI/bge-m3 api_key = ** base_url = https://api.siliconflow.cn/v1 vector_dimensions = 1024

How to reproduce

KAG 0.5.1

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

It looks like too many entities & relations extracted from one Chunk, and the properties need to be vectorized oversize maximum allowed batch size by siliconflow.

you can try reduce the length of splitter size.

Thank you for your reply. I reduced 'window_length' from 200 to 100 and kept split_length at 500. It worked fine. I know the problem. Thank you.

Bboyjie avatar Jan 07 '25 13:01 Bboyjie

Additionally, you can also upgrade KAG version to 0.6, where the vectorizer has a default batch size set to 32.

zhuzhongshu123 avatar Jan 09 '25 02:01 zhuzhongshu123