langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Add Chain equivalents of Map() and Reduce()

Open IndikaUdagedara opened this issue 1 year ago • 4 comments

These are two chains which provide map() and reduce() behaviour. I used them as follows

Setup:

# this is a 'reducer' chain - similar to the callbackFn here  https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce
extraction_chain = ExtractionChain(llm=llm) 

# this is a 'mapper' chain
transform_chain = TransformChain(input_variables=["input"], output_variables=["output"], transform=transform_func)

reduce_chain = ReduceChain(llm=llm, reducer_chain=extraction_chain)
map_chain = MapChain(llm=llm, mapper_chain=transform_chain)

Usage:

# I want to extract facts from multiple documents and transform them into a certain format

docs = ["a.pdf", "b.pdf"]

doc_list = [PyPDFLoader(doc).load_and_split() for doc in docs]
texts = [item.page_content for sublist in doc_list for item in sublist]

reduce_response = reduce_chain.run(input_values=texts, initial_value=[])
map_response = map_chain.run(input_values=reduce_response)

If you think this is useful I'll polish it up + add docs and tests.

Thank you for this amazing project!

IndikaUdagedara avatar May 04 '23 10:05 IndikaUdagedara

hey @IndikaUdagedara, i definitely think this idea has legs. have you tried using the MapReduceDocumentsChain by chance? curious if that is similar to what you're thinking of

dev2049 avatar May 05 '23 21:05 dev2049

Yes, I had a look at that one but it was not 'primitive' enough for my use case i.e. it did more than I wanted -- it collects metedata, manipulated Docs not strs etc. My use case is: I have a bunch of texts which I want to extract 'facts' from and return all of them as a list and then pass that to a mapper which transforms into a certain format. I suppose I could use the SummarizeChain for this purpose if it allowed to customize the prompt but then again summarization is just one form of reduction so I thought Reduce chain would be something to generalize that.

I suppose we could say MapReduceDocumentsChain is a special case of a Reduce chain + Map chain?

IndikaUdagedara avatar May 06 '23 03:05 IndikaUdagedara

@IndikaUdagedara all chains have an apply method on them - that covers the map use case right?

could you motivate the reduce use case a bit more?

hwchase17 avatar May 14 '23 02:05 hwchase17

@hwchase17 ah I must've missed the apply method. I suppose it covers the map use case then (which is basically applying a chain to a list and returning a list of same length)

The reduce chain is somewhat a generalization of the SummarizeChain. What I want is to be able to pass a list of texts to the chain that performs some operation on each item and produce a single value as the output (equivalent to Python or JS reduce() functions). My use case specifically is: I have a list of documents and I want to get the facts out of all of them. The output is a collection of facts that is not necessarily the same length as the original list.

The way this can be generalized to summarization use case is: the stuff, map_reduce, refine etc. chains will be the reducer function passed to the ReduceChain e.g.

summarize_using_stuff_method = ReduceChain(llm=llm, reducer_chain=StuffChain())
summarize_using_refine_method = ReduceChain(llm=llm, reducer_chain=RefineChain())

...
fact_extractor = ReduceChain(llm=llm, reducer_chain=FactExtractChain())

Hope that makes sense.

IndikaUdagedara avatar May 16 '23 06:05 IndikaUdagedara

@IndikaUdagedara Hi , could you, please, address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!

leo-gan avatar Sep 13 '23 01:09 leo-gan

Closing because the PR wouldn't line up with the current directory structure of the library (would need to be in /libs/langchain/langchain instead of /langchain). Feel free to reopen against the current head if it's still relevant!

efriis avatar Nov 07 '23 04:11 efriis