lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

LangChain LakeFSLoader: load document metadata tags

Open Isan-Rivkin opened this issue 11 months ago • 3 comments

Problem

Currently our LakeFSLoader loads document without metadata object tags and only adds the logical path tag as source key to Document. We ignore any user set object metadata.

So What?

This means that the LakeFSLoader is not helpful the moment I need to use object metadata for embedding, retrieval etc. For example, very common case in RAG of time weighted retriever.

To use the object tags I would have to implement custom logic for this listing all documents and getting the tags, bottom line it means I don't have a use in LakeFSLoader since I would have to load everything anyway myself.

Suggestion

When calling LakeFSLoader constructor in Langchain, add additional optional param to indicate id metadata tags should be added to the Document.

loader = LakeFSLoader(repo="...",ref="...",path="...", with_metadata=True) 

Isan-Rivkin avatar Dec 09 '24 09:12 Isan-Rivkin

@itaiad200 what are the planned next steps for this item?

talSofer avatar Mar 06 '25 14:03 talSofer

If we need to change the package to achieve this, let's do it

ozkatz avatar Mar 12 '25 13:03 ozkatz

Created a new repository using the langchain boilerplate and added a PR implementing lakefs document loader that supports metadata https://github.com/treeverse/langchain-lakefs/pull/1

guy-har avatar Mar 19 '25 08:03 guy-har