langchain
langchain copied to clipboard
Minor Change For New Wikipedia Loader
It's important for documents to have metadata["source"] field, for example, for index.query_with_sources()
@eyurtsev
i like it, any objections @eyurtsev @leo-gan?
It's a breaking change, so if we're OK proceeding, let's relabel commit title to Breaking Change so we remember to include it in the release notes as such.
Will need to resolve merge conflict first, and looking for input from @leo-gan
It's a breaking change, so if we're OK proceeding, let's relabel commit title to
Breaking Changeso we remember to include it in the release notes as such.Will need to resolve merge conflict first, and looking for input from @leo-gan
could just keep "page_url" in metadata as well
I'm in favor of having a generic provenance field that captures the protocol / storage. As long as the provenance field is completely specified it makes it easy to treat all content on an equal footing regardless of whether it came from s3, a website or a a row in a postgres database.
With that said, I doubt that our sources are specified correctly at the moment, but would be in favor of moving in that direction.
I'm in favor of having a generic provenance field that captures the protocol / storage. As long as the provenance field is completely specified it makes it easy to treat all content on an equal footing regardless of whether it came from s3, a website or a a row in a postgres database.
With that said, I doubt that our sources are specified correctly at the moment, but would be in favor of moving in that direction.
any suggestions for this pr specifically (for now, agree we should come up with more thoughtful approach in medium term)