langchain Improve the documentation and semantics of message processors

Improve the documentation and semantics of message processors

Open vkryukov opened this issue 11 months ago • 1 comments

Hello, it's me again :). Today I would like to discuss the documentation and implementation of LLMChain.run_message_processors.

TL;DR:

The documentation is a bit inconsistent, with a doc string in JsonProcessor contradicting the implementation
Also, other than JsonProcessor example, the message processor format is not formally documented.
Current implementation does not allow for breaking out of processing if an unrecoverable problem is encountered.

Proposals:

Make the documentation consistent (I can send a PR)
Allow for breaking out of message processing altogether (same, but that requires changing the existing contract).

1. Documentation is inconsistent

The doc string for JsonProcessor.run/2 states:

  - `{:cont, %Message{}}` - The returned message replaces the one being
    processed and no additional processors are run.
  - `{:halt, %Message{}}` - Future processors are skipped. The Message is
    returned as a response to the LLM for reporting errors.

The first statement doesn't seem to be in line with the implementation - it will continue chugging along, executing other message processors in turn. (And if you think about it, with the above semantics it's impossible for anything other than the first processor to ever run). So I think it's just a documentation typo.

Similarly, the process_return type is defined as

  @type processor_return :: {:continue, Message.t()} | {:halt, t(), Message.t()}

which is seemingly in conflict with the actual implementation, which expects {:cont, Message.t()} | {:halt, Message.t()}.

2. Formal documentation can be improved

There is not much documentation in JsonProcessor about the expected format of message processors, and there is no mention of it in either LLMChain.message_processors/2 or LLMChain.process_message/2.

I believe that message processing is pretty useful functionality, and deserves a more detailed documentation, possibly even a notebook example.

3. Semantics for breaking out of processing.

Right now, there is no way for breaking out of message processing: we need to either return {:cont, _m} in which case the consecutive message processors will be run, or {:halt, _m}, in which case a new message will be sent to the LLM. There is also a third option: we might not want to send any more messages to the LLM or continue with processing, but just return an error from the chain.

Would you be interested in adding such semantics? Or is there any other downsides for this which I don't see?

Dec 12 '24 22:12 vkryukov

Hi @vkryukov!

Love it. Yes, that area is lacking good documentation and you've correctly identified how it can be improved. Yes, I'd love some help with that!

Dec 15 '24 23:12 brainlid

langchain langchain copied to clipboard

Improve the documentation and semantics of message processors

1. Documentation is inconsistent

2. Formal documentation can be improved

3. Semantics for breaking out of processing.

langchain
langchain copied to clipboard