jedis For vector similarity search, calling HSET with bytes data for the vector of floats needs to be little endian order to make the calculations work in Redis

For vector similarity search, calling HSET with bytes data for the vector of floats needs to be little endian order to make the calculations work in Redis

Open extramileit opened this issue 2 years ago • 3 comments

This issue is related to the discussion below. See discussion 3122.

After figuring out how to use the Node.js bindings for Redis, I was able to successfully run a vector similarity search (VSS) against Redisearch meaning distances were calculated and the results were sensible. When I compared the byte data from node.js to the byte data I created in Java, I discovered that the node.js code was converting the float data in little endian order, whereas Java converts with big endian order by default. Because of this, the bytes passed to Redis from Jedis client using hset(byte[], byte[], byte[]) did not allow the distance function to work inside the redi-search module code as the RediSearch code is expecting little endian order.

Here is code that produces big endian bytes in Java. This will not allow distance functions to work. You just get NaN errors.

public static byte[] toBigEndianBytes(float[] floatArray) throws IOException {
    try (ByteArrayOutputStream bas = new ByteArrayOutputStream();
         DataOutputStream dos = new DataOutputStream(bas)) {
        for (float f : floatArray) {
            dos.writeFloat(f);
        }
        return bas.toByteArray();
    }
}

# sample output
toBigEndianBytes(new float[] {0.2f}).toString()
[62, 76, -52, -51]

The following code does work and produces distance data and valid results.

public static byte[] toLittleEndianBytes(float[] floatArray) throws IOException {
    try (ByteArrayOutputStream bas = new ByteArrayOutputStream();
         DataOutputStream dos = new DataOutputStream(bas);) {
        for (float f : floatArray) {
            dos.write(getBytesLittleEndianOrder(f));
        }
        return bas.toByteArray();
    }
}
private static byte[] getBytesLittleEndianOrder(float f) {
        int intBits =  Float.floatToIntBits(f);
        return new byte[]{(byte) intBits, (byte) (intBits >> 8), (byte) (intBits >> 16), (byte) (intBits >> 24)};
    }
# sample output
toLittleEndianBytes(new float[] {0.2f}).toString()
[-51, -52, 76, 62]

I'm not sure if this byte code mapping using little endian order should be part of the Jedis client code or not. If not, it may be useful to document this in the Javadocs or elsewhere. Perhaps the hset javadocs can mention the vector similarity search requires little endian order for bytes representation.

Thanks

Discussed in https://github.com/redis/jedis/discussions/3122

^{Originally posted by nemo83 August 26, 2022} Hello,

I've recently come across HNSW and found out REDIS has added support to it. So instead of building an api that manages an HNSW index, I am thinking of just deploying a nice REDIS cluster and use their HNSW capabilities.

My vectors are float[] and the loading of the index, apparenlty, should be working fine as I'm using the jsonSet. So when I create the index, the data are loaded from the json, correctly deserialised into float32 and the index is built. Or at least that's what I think is happening.

Now my problem is that I don't seem to be able to query the index.

This is my code:

float[] tensor = ...

String queryString = "*=>[KNN 10 @vector $tensor]";

Query q = new Query(queryString)
                .addParam("tensor", tensor)
                .dialect(2);

SearchResult searchResult = jedisClient.ftSearch("word-index", q)
int results = totalResults
System.out.println(results); // gives me 0

In the tests I found: https://github.com/redis/jedis/blob/master/src/test/java/redis/clients/jedis/modules/search/SearchTest.java#L440

That seem to be working for strings, bytes and floats (but not floatp[]?!)

Any suggestions would be greatly appreciated.