langchain [feature] Mock LLM by record and replay responses

Feature request

Components containing LLM are hard to be unit-tested, because their output is not deterministic, and they rely on API which could fail.

So I propose a method to mock LLM output by simply recording and replaying the responses.

Motivation

It could be helpful in TDD-based workflow, in which we want to do refactoring, without changing the behavior.

Your contribution

I've made an example in my personal project, which dumps output to JSON file.

The implementation:

class MockOpenAI(OpenAI):
    from_file: Path = None
    to_file: Path = None
    records: List[LLMResult] = []

# it overrides the generate() method

https://github.com/ofey404/WalkingShadows/blob/2cd39f6286193845ba3018bb2bcd42a7ff736fe9/src/backend/services/world/internal/llm/llm.py#L18-L21

The usage:

MockOpenAI(
    # to_file=Path(__file__).parent / "test_world.json"
    from_file=Path(__file__).parent
        / "test_world.json"
)

https://github.com/ofey404/WalkingShadows/blob/2cd39f6286193845ba3018bb2bcd42a7ff736fe9/src/backend/services/world/api/world/test/test_world.py#L13C1-L17

If it's proper, I'd like to contribute it to langchain, and I would refine the interface to make it more generic.

Anyone is interested in this? I'd like to find some support from maintainers.

May 13 '23 15:05 ofey404

@hwchase17 @agola11

May 13 '23 16:05 ofey404

I believe the current FakeListLLM with BaseCallbackHandler can be utilized for the said purpose.

May 14 '23 04:05 amicus-veritatis

I believe the current FakeListLLM with BaseCallbackHandler can be utilized for the said purpose.

Thank you! It's neat.

May 14 '23 04:05 ofey404

Completed

May 15 '23 07:05 ofey404

Hi, any chance you share an example? 🙏

EDIT: I have figured it out

from langchain.llms.fake import FakeListLLM
from langchain.callbacks.base import BaseCallbackHandler
from typing import Any, Dict, List, Literal, Optional, Union

class FakeListLLM(FakeListLLM):
    def model_name():
        return "fake-list-llm"


class CustomCallbackHandler(BaseCallbackHandler):
    """Save chain input"""

    def __init__(self) -> None:
        super().__init__()
        self.input_dict = None
        self.input_prompts = None

    def on_chain_start(
        self, serialized: dict[str, Any], inputs: dict[str, Any], **kwargs: Any
    ) -> None:
        """Run when chain starts running."""
        self.input_dict = inputs

    def on_llm_start(
        self, serialized: dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> None:
        """Run when LLM starts running."""
        self.input_prompts = prompts

    @property
    def always_verbose(self) -> bool:
        """Whether to call verbose callbacks even if verbose is False."""
        return True

    ##### Everything below this is default

    def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
        """Run on new LLM token. Only available when streaming is enabled."""

    def on_llm_end(self, response, **kwargs: Any) -> None:
        """Run when LLM ends running."""

    def on_llm_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> None:
        """Run when LLM errors."""

    def on_chain_end(self, outputs: dict[str, Any], **kwargs: Any) -> None:
        """Run when chain ends running."""

    def on_chain_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> None:
        """Run when chain errors."""

    def on_tool_start(
        self, serialized: dict[str, Any], input_str: str, **kwargs: Any
    ) -> None:
        """Run when tool starts running."""

    def on_tool_end(self, output: str, **kwargs: Any) -> None:
        """Run when tool ends running."""

    def on_tool_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> None:
        """Run when tool errors."""

    def on_text(self, text: str, **kwargs: Any) -> None:
        """Run on arbitrary text."""

    def on_agent_action(self, action, **kwargs: Any) -> None:
        """Run on agent action."""

    def on_agent_finish(self, finish, **kwargs: Any) -> None:
        """Run on agent end."""

then later used it as follow

FakeListLLM(
            responses=["<<TESTING>>"] * 128, callbacks=[CustomCallbackHandler()]
        )

Nov 14 '23 15:11 tanguyantoine