redis-ai-resources icon indicating copy to clipboard operation
redis-ai-resources copied to clipboard

Contribute advanced Hybrid search example in OpenAI Cookbook (python)

Open tylerhutcherson opened this issue 2 years ago • 5 comments

The existing cookbook just touches the surface: https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/redis/getting-started-with-redis-and-openai.ipynb

Contribute a Python notebook that demonstrates complex Hybrid queries with Redis VSS and other search features (an ecommerce dataset might work nicely) including

  • Numeric range filters
  • Tag filters
  • Full text search "filters"
  • Client-side hybrid scoring combing both BM25 lexical AND semantic search. This could be done in a pipeline to send 1 redis call to fetch both search results (top K) and then merge the sets. Show performance improvement with this technique over pure lexical or pure semantic?

tylerhutcherson avatar May 02 '23 16:05 tylerhutcherson

Submitted PR

michaelskyuan avatar May 11 '23 23:05 michaelskyuan

Initial review submitted from our end >>> https://github.com/openai/openai-cookbook/pull/417

@michaelskyuan at some point we will also want to make an update to this notebook that covers bullet point 4 above. This is a bit "green field" in the sense that we have not yet explicitly tried this. But it's theoretically possible to do true weighted hybrid search using a redis pipeline command and "merging" results from the two scoring algorithms (BM25 + KNN/CosineD). I sense the lift will be a bit more on this, and since not immediately pressing, I will spin it off into a separate issue that we can re-prioritize when the time is right, probably in the next month.

tylerhutcherson avatar May 16 '23 14:05 tylerhutcherson

I agree @tylerhutcherson. And I believe this topic deserves it's own separate notebook with a denser text dataset. Let's leave OOTB Redis Hybrid search functionality on the current notebook and have a specific notebook that will address normalization of lexical and semantic scoring using a more appropriate dataset instead of an ecommerce dataset.

michaelskyuan avatar May 25 '23 16:05 michaelskyuan

@michaelskyuan This App/notebook was recently contributed by OpenAI. We could use some of this.

Spartee avatar May 26 '23 22:05 Spartee

I am struggling trying to apply a hybrid filter similar to these examples, the difference I am using langchainjs 0.2.16 from typescript with the standard bindings, using RedisSearch 2.6 from the cloud.

I have the following metatadata associated to a chunk item

{\\\"name\\\"\\:\\\"20030331\\\",\\\"description\\\"\\:\\\"2003/03/31\\\",\\\"year\\\"\\:2003,\\\"month\\\"\\:3,\\\"day\\\"\\:31,\\\"date\\\"\\:20030331,\\\"id\\\"\\:\\\"5251cc41\\-307d\\-4117\\-9b0f\\-9e408eb37011\\\",\\\"doc_id\\\"\\:\\\"5251cc41\\-307d\\-4117\\-9b0f\\-9e408eb37011\\\"}

Would like to filter in a range for the year for example similar to the following:

@metadata:(\\\"year\\\"\\:[(2001 2004])

always returning 0 records. If I use exact match or the negaction the case works. I get started with

filterQuery = `@metadata:(-\\\"year\\\"\\:2001)`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

OK, 3 items returned

@metadata:(-\"year\"\:2001) 3

Then I started trying with a range

filterQuery = `@metadata:(\\\"year\\\"\\:\\[2003 2005\\])`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

should return elements buyt I get

@metadata:(\"year\"\:\[2003 2005\]) 0

Very similar to the previous one

filterQuery = `@metadata:(\\\"year\\\"\\:\\[\\(2003 2005\\])`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

should return elements too but still I get 0

@metadata:(\"year\"\:\[\(2003 2005\]) 0

Trying other filters and using And (&) worked fine:

filterQuery = `@metadata:(\\\"year\\\"\\:2003&\\\"date\\\"\\:20030331)`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

OK, 1 item filtered

@metadata:(\"year\"\:2003&\"date\"\:20030331) 1

Got some ideas from the referenced notebook but still cannot make it work. any idea?

thanks in advance!

ladrians avatar Aug 15 '24 17:08 ladrians