redis-py icon indicating copy to clipboard operation
redis-py copied to clipboard

Bytes Vectors from `r.hget` vs Bytes string returned from `r.ft().search(query="*")`

Open oq-9 opened this issue 1 year ago • 2 comments

Redis Python Lib Version: version 4.5.5

Redis Stack Version: version 7.0.0

Platform: Python 3.10.6 and Ubuntu 22.04

Description: Description of your issue, stack traces from errors and code that reproduces the issue

After storing a bunch of numpy vectors in bytes in HSETs and creating an index (FT), I am trying to retrieve all of the embeddings using FT.SEARCH with "*" query, however, the vector is returned in a string that differs from the bytes format I get when using HGET. I'll add a few line of code as an example:

import redis
import os
import numpy as np

_redis_match_config = os.getenv("NQAI_REDIS_MATCH_CONFIG")
fake_vec = np.array([0.1,0.2,0.3,0.4])
r = redis.Redis(**_redis_match_config)
expert_hash = {"person_id":1, "vector_emb" : fake_vec.astype(np.float32).tobytes()}
r.hset("person:1", mapping=expert_hash)
index_name = "person"
person_prefix = f"{index_name}:"
vector_search_attributes = {"TYPE": "FLOAT32", "DIM": 4, "DISTANCE_METRIC": "COSINE"}
schema = (
                    TagField("person_id"),
                    VectorField("embeddings_bio", algorithm="HNSW", attributes=vector_search_attributes)
                    )

r.ft(index_name).create_index(fields=schema, definition=IndexDefinition(prefix=[person_prefix], index_type=IndexType.HASH))

byets_person_1 = r.hget("person:1", "vector_emb")
print(byets_person_1)
print(np.frombuffer(byets_person_1, dtype=np.float32))
> output : b"\xcd\xcc\xcc=\xcd\xccL>\x9a\x99\x99>\xcd\xcc\xcc>"
> output : array([0.1, 0.2, 0.3, 0.4], dtype=float32)

However, when I do:

query = (
                    Query("*")
                    .return_fields("id", "vector_emb",)
                )
all_of = r.ft(index_name).search(query=query, query_params={}).docs
print(all_of[0]["vector_emb"])
print(all_of[0]["vector_emb"].encode("utf-32"))
print(np.frombuffer(bytes(all_of[0]["vector_emb"].encode("utf-32")), dtype=np.float32))
> output : "=L>>>"
> output: b'\xff\xfe\x00\x00=\x00\x00\x00L\x00\x00\x00>\x00\x00\x00>\x00\x00\x00>\x00\x00\x00'
> output : array([9.1475e-41 8.5479e-44 1.0650e-43 8.6881e-44 8.6881e-44 8.6881e-44], dtype=float32)

I have tried different combinations of .encode("utf-xx") and dtype=np.floatxx to no avail! Please help. Thanks.

oq-9 avatar May 22 '23 21:05 oq-9

Yes, had the same bug on my end when I tried retrieving the list of floats vector from results = self.client.ft(self.index).search(query_expression, query_params).docs It came back with a weird encoding which could not be decoded back to the original vector.

trish11953 avatar Aug 25 '23 17:08 trish11953

I'm having a similar issue. I need to read these vectors back and do some processing on them, but I'm unable to decode them when I read them from a hash using hget.

AdamAdLightning avatar Jan 24 '24 16:01 AdamAdLightning