langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Together LLM (Completions) generate() function's output is missing generation_info and llm_output

Open kerkkoh opened this issue 6 months ago • 3 comments

Checked other resources

  • [X] I added a very descriptive title to this issue.
  • [X] I searched the LangChain documentation with the integrated search.
  • [X] I used the GitHub search to find a similar question and didn't find it.
  • [X] I am sure that this is a bug in LangChain rather than my code.
  • [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code


from langchain_together import Together
from llama_recipes.inference.prompt_format_utils import (
    build_default_prompt,
    create_conversation,
    LlamaGuardVersion,
) # LlamaGuard 3 Prompt from https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/inference/prompt_format_utils.py
from pydantic.v1.types import SecretStr

t = Together(
    model="meta-llama/Meta-Llama-Guard-3-8B",
    together_api_key=SecretStr("<=== API Key goes here ===>"),
    max_tokens=35,
    logprobs=1,
    temperature=0
)

# Expected to return a LLMResult object with Generations that have logprobs, and llm_output with usage
res = t.generate([build_default_prompt("User", create_conversation(["<Sample user prompt>"]), LlamaGuardVersion["LLAMA_GUARD_3"])])

print(res.json()) # {"generations": [[{"text": "safe", "generation_info": null, "type": "Generation"}]], "llm_output": null, "run": [{"run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"}]}

Error Message and Stack Trace (if applicable)

No response

Description

I'm trying to use langchain_together library's Together for calling the Together.ai LLM completions endpoint and expecting to get a LLMResult with logprobs inside of generation_info & usage in llm_output.

Instead the following lacking LLMResult is given as output:

{
    "generations": [
        [
            {
                "text": "safe",
                "generation_info": null,
                "type": "Generation"
            }
        ]
    ],
    "llm_output": null,
    "run": [
        {
            "run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"
        }
    ]
}

where generation_info is None, and llm_output is None.

This should be fixed by updating the langchain_together.Together() class so that it also has the necessary functions to return LLMResults with generation_info & llm_output defined when the response includes fields that are to be put in them. The expected output is:

{
    "generations": [
        [
            {
                "text": "safe",
                "generation_info": {
                    "finish_reason": "eos",
                    "logprobs": {
                        "tokens": [
                            "safe",
                            "<|eot_id|>"
                        ],
                        "token_logprobs": [
                            -4.6014786e-05,
                            -0.008911133
                        ],
                        "token_ids": [
                            19193,
                            128009
                        ]
                    }
                },
                "type": "Generation"
            }
        ]
    ],
    "llm_output": {
        "token_usage": {
            "total_tokens": 219,
            "completion_tokens": 2,
            "prompt_tokens": 217
        },
        "model_name": "meta-llama/Meta-Llama-Guard-3-8B"
    },
    "run": [
        {
            "run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"
        }
    ]
}

This could technically be avoided by using the langchain_openai library's langchain_openai.OpenAI, but the generate method of this class is no longer compatible with the old OpenAI Completions -style API that Together.ai uses. Mainly the underlying OpenAIBase._generate method calls the underlying OpenAI completions client with a list[str] of prompts, which Together.ai doesn't support.

Just in case someone finds this issue looking for a fix, I have a workaround for the workaround. The problem with the langchain_openai workaround can be bodged by overriding the openai.client.completions.create method after initializing the LLM class, using the together python library's equivalent method, and removing incompatible arguments to create, which the API doesn't support. The following is a quick example for doing this:

import together
from langchain_openai import OpenAI
from pydantic.v1.types import SecretStr
from llama_recipes.inference.prompt_format_utils import (
    build_default_prompt,
    create_conversation,
    LlamaGuardVersion,
) # LlamaGuard 3 Prompt from https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/inference/prompt_format_utils.py

together_client = together.Together(api_key=SETTINGS.together_api_key)
llm = OpenAI(
    model="meta-llama/Meta-Llama-Guard-3-8B",
    api_key=SecretStr("<=== API Key goes here ===>"),
    base_url="https://api.together.xyz/v1", # This may be redundant as we override the create class method anyways
    max_tokens=200,
    logprobs=1,
    temperature=0
)

def overridden_create(prompt: list[str], **kwargs):
    # Overridden openai.client.completions.create method to use the Together client, as Together doesn't support certain inputs (e.g. seed) and lists of prompts
    together_allowed_keys = ["model", "prompt", "max_tokens", "stream", "stop", "temperature", "top_p", "top_k", "repetition_penalty", "logprobs", "echo", "n", "safety_model"]
    kwargs = {k: v for k, v in kwargs.items() if k in together_allowed_keys}
    return together_client.completions.create(prompt=prompt[0], **kwargs)

llm.client.create = overridden_create
llm_result = llm.generate([build_default_prompt("User", create_conversation(["<Sample user prompt>"]), LlamaGuardVersion["LLAMA_GUARD_3"])])

print(llm_result.json()) # {"generations": [[{"text": "safe", "generation_info": {"finish_reason": "eos", "logprobs": {"tokens": ["safe", "<|eot_id|>"], "token_logprobs": [-4.6014786e-05, -0.008911133], "token_ids": [19193, 128009]}}, "type": "Generation"}]], "llm_output": {"token_usage": {"total_tokens": 219, "completion_tokens": 2, "prompt_tokens": 217}, "model_name": "meta-llama/Meta-Llama-Guard-3-8B"}, "run": [{"run_id": "f015adc7-7558-4251-9fe6-9d11a646c173"}]}

generation = llm_result.generations[0][0]
logprobs = generation.generation_info["logprobs"] # Wow, it works!
token_usage = llm_result.llm_output["token_usage"] # Wow, we also get usage!

System Info

langchain==0.2.14 langchain-core==0.2.32 langchain-openai==0.1.21 langchain-text-splitters==0.2.2 langchain-together==0.1.5

mac (Macbook Pro M1 16GB, 2021), macOS Sonoma 14.5 (23F79)

Python 3.9.19

kerkkoh avatar Aug 15 '24 15:08 kerkkoh