KAG icon indicating copy to clipboard operation
KAG copied to clipboard

[Bug] [KAG] 执行example里的例子时报错:Index query vector has 1536 dimensions, but indexed vectors have 1024.

Open zzyyll2 opened this issue 11 months ago • 6 comments

Search before asking

  • [x] I had searched in the issues and found no similar issues.

Operating system information

Linux

What happened

使用dev/release里的docker-compose启动四个服务后,登录UI修改密码,配置了全局参数,在页面上上传知识都是正常的。 当使用KAG里的example/baike或medicine时报相同的错误: 报错步骤是 cd builder && python indexer.py: org.neo4j.driver.exceptions.ClientException: Failed to invoke procedure db.index.vector.queryNodes: Caused by: java.lang.IllegalArgumentException: Index query vector has 1536 dimensions, but indexed vectors have 1024.

How to reproduce

git clone kag

  1. cd example/baike
  2. 修改kag_config.yaml
  3. knext project restore .
  4. knext schema commit
  5. cd builder && python indexer.py 此步出错:
  6. org.neo4j.driver.exceptions.ClientException: Failed to invoke procedure db.index.vector.queryNodes: Caused by: java.lang.IllegalArgumentException: Index query vector has 1536 dimensions, but indexed vectors have 1024.

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

zzyyll2 avatar Jan 17 '25 04:01 zzyyll2

  • 修改kag_config.yaml

Image

caszkgui avatar Jan 17 '25 07:01 caszkgui

kag_config.yaml已经修改过了,如下:

Image

zzyyll2 avatar Jan 17 '25 09:01 zzyyll2

原因找到了,是接口没有传递dimensions导致的,请问这里为什么不传递demensions呢?

Image

zzyyll2 avatar Jan 17 '25 17:01 zzyyll2

原因找到了,是接口没有传递dimensions导致的,请问这里为什么不传递demensions呢?

Image

The vector dimensions of the text-embedding-ada-002 model is 1536 and fixed. It's not a parameter.

xionghuaidong avatar Jan 21 '25 02:01 xionghuaidong

好的,我懂了,vector_dimensions的值要与model值对应,这个tips希望能录入到文档里,避免大家走弯路,谢谢。

Image

如上图所示,假如我用text-embedding-3-large,这个维度是可变的,最小512,最大3072,如果使用这个模型的话,是设置vector_dimensions这个就可以了吗?我能把这个值设置成512吗?

zzyyll2 avatar Jan 21 '25 07:01 zzyyll2

I have met the similar issue with mismatch query vector and index vector dimensions while testing on the. I requested the bge-m3 model in the openai model but the response embedding is 512 d instead of 1024 d, and the values are around 0. The http request within the code seems to set encoding_format to base64 instead of float, while I set it to float, the response is 1024 d. I'm not sure whether this parameter is the cause of mismatched dimensions. Does the indexing and the querying code share the same embedding procedure and parameters? If so, there should not be mismatch between indexing and querying

Image

Image

zoidburg avatar Feb 25 '25 03:02 zoidburg