[Bug] [BAAI/bge-m3] 构建时嵌入报错
Search before asking
- [X] I had searched in the issues and found no similar issues.
Operating system information
Windows
What happened
异常详细信息: openai.APIStatusError: Error code: 413 - {'code': 20042, 'message': 'input batch size 83 > maximum allowed batch size 64', 'data': None} 嵌入配置如下: vectorizer = kag.common.vectorizer.OpenAIVectorizer model = BAAI/bge-m3 api_key = ** base_url = https://api.siliconflow.cn/v1 vector_dimensions = 1024
How to reproduce
KAG 0.5.1
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Search before asking
- [x] I had searched in the issues and found no similar issues.
Operating system information
Windows
What happened
异常详细信息: openai.APIStatusError: Error code: 413 - {'code': 20042, 'message': 'input batch size 83 > maximum allowed batch size 64', 'data': None} 嵌入配置如下: vectorizer = kag.common.vectorizer.OpenAIVectorizer model = BAAI/bge-m3 api_key = ** base_url = https://api.siliconflow.cn/v1 vector_dimensions = 1024
How to reproduce
KAG 0.5.1
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
It looks like too many entities & relations extracted from one Chunk, and the properties need to be vectorized oversize maximum allowed batch size by siliconflow.
you can try reduce the length of splitter size.
Search before asking
- [x] I had searched in the issues and found no similar issues.
Operating system information
Windows
What happened
异常详细信息: openai.APIStatusError: Error code: 413 - {'code': 20042, 'message': 'input batch size 83 > maximum allowed batch size 64', 'data': None} 嵌入配置如下: vectorizer = kag.common.vectorizer.OpenAIVectorizer model = BAAI/bge-m3 api_key = ** base_url = https://api.siliconflow.cn/v1 vector_dimensions = 1024
How to reproduce
KAG 0.5.1
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
It looks like too many entities & relations extracted from one Chunk, and the properties need to be vectorized oversize maximum allowed batch size by siliconflow.
you can try reduce the length of splitter size.
Thank you for your reply. I reduced 'window_length' from 200 to 100 and kept split_length at 500. It worked fine. I know the problem. Thank you.
Additionally, you can also upgrade KAG version to 0.6, where the vectorizer has a default batch size set to 32.