ydb-python-sdk icon indicating copy to clipboard operation
ydb-python-sdk copied to clipboard

bug: Slow vector parameter pass

Open azevaykin opened this issue 8 months ago • 4 comments

Bug Report

YDB Python SDK version: 3.21.1 Python version: 3.8.10 OS: Linux-5.4.210-39.1.pagevecsize-x86_64-with-glibc2.29

Behavior:

  1. When I pass a vector to the query as a list, I have 127 RPS.
  2. When I pass a vector to the query as a string, I have 617 RPS.

First way is the default way in YDB vector search. It's used in langchain-ydb. But it's slower. Second way is undocumented way but it's much faster.

In C++ SDK we have numbers: 810 and 860 RPS.

Please, fix vector pass as a list in python SDK. 127 RPS is too slow.

See an example in the attached python file: vector-parameter.py.

You can change behaviour by these lines:

MODE = "list"
# MODE = "string"

azevaykin avatar May 02 '25 16:05 azevaykin

@vgvoleg , please have a look

azevaykin avatar May 02 '25 16:05 azevaykin

@asmyasnikov , please have a look

azevaykin avatar May 02 '25 16:05 azevaykin

I take the attached python file: vector-parameter.py and changed: 1_ Line 13

# MODE = "list"
MODE = "string"

2_ Add bytes copy on line 58 in order to remove zero-copy.

parameters = {"$EmbeddingString": ydb.TypedValue(bytes(random.choice(embeddings_binary)), ydb.PrimitiveType.String)}

The result: the RPS is high. Conclusion: we should consider passing vector as serialized string.

azevaykin avatar Jul 31 '25 09:07 azevaykin

Changes to documentation about serialization format https://github.com/ydb-platform/ydb/pull/22048

azevaykin avatar Jul 31 '25 13:07 azevaykin