ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Feature Request]: Retrieve chunks similar to another chunk

Open panzi opened this issue 6 months ago • 3 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Is your feature request related to a problem?

We need to find chunks that are similar to another chunk. That would be an easy vector search operation, since the embedding vector of the first chunk is already in RAGFlow's database, but there is no API for that kind of retrieval and as such you need to send the text representation of the chunk and it will run through the embedding process, which is slow and incurs unnecessary API costs.

Describe the feature you'd like

A simple API method to retrieve chunks that are similar to other chunks of the same document. With the same similarity parameter as the retrieve() method.

Describe implementation you've considered

Using the text of the chunk as a query.

Documentation, adoption, use case

Finding similar products to a given product.

Additional information

No response

panzi avatar Jun 12 '25 12:06 panzi

I don't know the RAGFlow internals, but the way I imagine it this should be easy to implement. It would be the same as retrieve() but with fewer steps. So copy paste retrieve() and instead of getting the embedding vector from an embedding service get it from the knowledge base via a chunk ID.

panzi avatar Jun 12 '25 12:06 panzi

I don't know the RAGFlow internals, but the way I imagine it this should be easy to implement. It would be the same as retrieve() but with fewer steps. So copy paste retrieve() and instead of getting the embedding vector from an embedding service get it from the knowledge base via a chunk ID.

Interesting idea, as I know the internal retrieve method will combine both full text search and embedding search, base on your description I think you just need embedding?

Woody-Hu avatar Jun 13 '25 00:06 Woody-Hu

@panzi could you explain more on your use cases?

ZhenhangTung avatar Jun 13 '25 02:06 ZhenhangTung

@ZhenhangTung It's meant as a cheap way to show similar products. Like the user views a certain product and gets a list of similar products shown simply by vector similarity of the embeddings.

panzi avatar Jun 23 '25 18:06 panzi

@panzi ahhh! Got it. It's a common use case for ecommerce platform. I've noted down and will discuss with the team.

ZhenhangTung avatar Jun 24 '25 09:06 ZhenhangTung

@panzi One follow-up: I know that in many ecommerce platforms, recommendation algorithms are not only based on vector similarity but also incorporate behavioral signals like page views, co-clicks, etc. I’m curious — are you building a recommendation system for an ecommerce use case? If so, I’d love to hear how you imagine RAGFlow might help.

ZhenhangTung avatar Jun 24 '25 09:06 ZhenhangTung

Just discussed with @KevinHuSh , we propose that you could use retrieval API to achieve your goal. question could be your product details.

ZhenhangTung avatar Jun 24 '25 11:06 ZhenhangTung

@ZhenhangTung It is a cheap little recommendation system as an extra to a chatbot.

And yes of course I use the retrieval API like that right now, but that means re-embedding the product details every time similar products are displayed. Especially when using a 3rd party service for embedding that is just wasteful. The embeddings already exist as a chunk in the knowledge base and I have that chunk ID stored in our DB with the product.

I assume the retrieval API just does the same embeddings and then a vector lookup, so that step of calling out to the embedding service could be skipped. It's for efficiency and cost reduction.

panzi avatar Jun 25 '25 19:06 panzi