Add an easy way to use LangChain components wiithout writing Python code
It is pretty easy to use LangChain, but you still have to write Python code.
There are some APIs that should be easy to integrate without writing code, for instance the DocumentLoader API. https://python.langchain.com/docs/modules/data_connection/document_loaders/
What about something like
pipeline:
- type: "s3-source"
.....
- type: "langchain-documentloader-processor"
configuration:
documentLoader: CSVLoader
params....
....
This way LangStream users can use the full power of LangChain without writing any code
This would be a huge value add. I can look into it further.
We would be the only platform on the market to my knowledge with a no-code option for LangChain, and we could co-market it in all the LangChain forums, etc.
The interface for the document loader is pretty simple:
load()
lazy_load()
and in some cases:
load_and_split(text_splitter)
We would just need to account for the constructors, which seem to be more widely varied.
The other things we'd really want to find a way to hook into (from the LangChain project) would be chains, embeddings, prompts, andllms
That would be quite easy to do in Python as it's easy to call the document loader constructor and constructor parameters by name/reflection.
The problem I see is how to indicate that the source processed the document in a generic way.
For instance in the LangChain S3 source, we delete the file from the bucket when commit is called.
We need something otherwise a pod restart will trigger the document loading again.
See https://github.com/LangStream/langstream/pull/543 for a first step towars that direction.