lakeFS
lakeFS copied to clipboard
LangChain LakeFSLoader: load document metadata tags
Problem
Currently our LakeFSLoader loads document without metadata object tags and only adds the logical path tag as source key to Document. We ignore any user set object metadata.
So What?
This means that the LakeFSLoader is not helpful the moment I need to use object metadata for embedding, retrieval etc. For example, very common case in RAG of time weighted retriever.
To use the object tags I would have to implement custom logic for this listing all documents and getting the tags, bottom line it means I don't have a use in LakeFSLoader since I would have to load everything anyway myself.
Suggestion
When calling LakeFSLoader constructor in Langchain, add additional optional param to indicate id metadata tags should be added to the Document.
loader = LakeFSLoader(repo="...",ref="...",path="...", with_metadata=True)
@itaiad200 what are the planned next steps for this item?
If we need to change the package to achieve this, let's do it
Created a new repository using the langchain boilerplate and added a PR implementing lakefs document loader that supports metadata https://github.com/treeverse/langchain-lakefs/pull/1