langchain
langchain copied to clipboard
Add Chain equivalents of Map() and Reduce()
These are two chains which provide map() and reduce() behaviour. I used them as follows
Setup:
# this is a 'reducer' chain - similar to the callbackFn here https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce
extraction_chain = ExtractionChain(llm=llm)
# this is a 'mapper' chain
transform_chain = TransformChain(input_variables=["input"], output_variables=["output"], transform=transform_func)
reduce_chain = ReduceChain(llm=llm, reducer_chain=extraction_chain)
map_chain = MapChain(llm=llm, mapper_chain=transform_chain)
Usage:
# I want to extract facts from multiple documents and transform them into a certain format
docs = ["a.pdf", "b.pdf"]
doc_list = [PyPDFLoader(doc).load_and_split() for doc in docs]
texts = [item.page_content for sublist in doc_list for item in sublist]
reduce_response = reduce_chain.run(input_values=texts, initial_value=[])
map_response = map_chain.run(input_values=reduce_response)
If you think this is useful I'll polish it up + add docs and tests.
Thank you for this amazing project!
hey @IndikaUdagedara, i definitely think this idea has legs. have you tried using the MapReduceDocumentsChain
by chance? curious if that is similar to what you're thinking of
Yes, I had a look at that one but it was not 'primitive' enough for my use case i.e. it did more than I wanted -- it collects metedata, manipulated Doc
s not str
s etc. My use case is: I have a bunch of texts which I want to extract 'facts' from and return all of them as a list and then pass that to a mapper which transforms into a certain format. I suppose I could use the SummarizeChain for this purpose if it allowed to customize the prompt but then again summarization is just one form of reduction so I thought Reduce
chain would be something to generalize that.
I suppose we could say MapReduceDocumentsChain
is a special case of a Reduce
chain + Map
chain?
@IndikaUdagedara all chains have an apply
method on them - that covers the map
use case right?
could you motivate the reduce
use case a bit more?
@hwchase17 ah I must've missed the apply
method. I suppose it covers the map
use case then (which is basically applying a chain to a list and returning a list of same length)
The reduce
chain is somewhat a generalization of the SummarizeChain
. What I want is to be able to pass a list of texts to the chain that performs some operation on each item and produce a single value as the output (equivalent to Python or JS reduce()
functions). My use case specifically is: I have a list of documents and I want to get the facts
out of all of them. The output is a collection of facts that is not necessarily the same length as the original list.
The way this can be generalized to summarization use case is: the stuff
, map_reduce
, refine
etc. chains will be the reducer
function passed to the ReduceChain
e.g.
summarize_using_stuff_method = ReduceChain(llm=llm, reducer_chain=StuffChain())
summarize_using_refine_method = ReduceChain(llm=llm, reducer_chain=RefineChain())
...
fact_extractor = ReduceChain(llm=llm, reducer_chain=FactExtractChain())
Hope that makes sense.
@IndikaUdagedara Hi , could you, please, address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!
Closing because the PR wouldn't line up with the current directory structure of the library (would need to be in /libs/langchain/langchain instead of /langchain). Feel free to reopen against the current head if it's still relevant!