langchain
langchain copied to clipboard
Together LLM (Completions) generate() function's output is missing generation_info and llm_output
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
from langchain_together import Together
from llama_recipes.inference.prompt_format_utils import (
build_default_prompt,
create_conversation,
LlamaGuardVersion,
) # LlamaGuard 3 Prompt from https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/inference/prompt_format_utils.py
from pydantic.v1.types import SecretStr
t = Together(
model="meta-llama/Meta-Llama-Guard-3-8B",
together_api_key=SecretStr("<=== API Key goes here ===>"),
max_tokens=35,
logprobs=1,
temperature=0
)
# Expected to return a LLMResult object with Generations that have logprobs, and llm_output with usage
res = t.generate([build_default_prompt("User", create_conversation(["<Sample user prompt>"]), LlamaGuardVersion["LLAMA_GUARD_3"])])
print(res.json()) # {"generations": [[{"text": "safe", "generation_info": null, "type": "Generation"}]], "llm_output": null, "run": [{"run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"}]}
Error Message and Stack Trace (if applicable)
No response
Description
I'm trying to use langchain_together
library's Together
for calling the Together.ai LLM completions endpoint and expecting to get a LLMResult with logprobs
inside of generation_info
& usage
in llm_output
.
Instead the following lacking LLMResult is given as output:
{
"generations": [
[
{
"text": "safe",
"generation_info": null,
"type": "Generation"
}
]
],
"llm_output": null,
"run": [
{
"run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"
}
]
}
where generation_info
is None, and llm_output
is None.
This should be fixed by updating the langchain_together.Together()
class so that it also has the necessary functions to return LLMResults with generation_info
& llm_output
defined when the response includes fields that are to be put in them. The expected output is:
{
"generations": [
[
{
"text": "safe",
"generation_info": {
"finish_reason": "eos",
"logprobs": {
"tokens": [
"safe",
"<|eot_id|>"
],
"token_logprobs": [
-4.6014786e-05,
-0.008911133
],
"token_ids": [
19193,
128009
]
}
},
"type": "Generation"
}
]
],
"llm_output": {
"token_usage": {
"total_tokens": 219,
"completion_tokens": 2,
"prompt_tokens": 217
},
"model_name": "meta-llama/Meta-Llama-Guard-3-8B"
},
"run": [
{
"run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"
}
]
}
This could technically be avoided by using the langchain_openai
library's langchain_openai.OpenAI
, but the generate method of this class is no longer compatible with the old OpenAI Completions -style API that Together.ai uses. Mainly the underlying OpenAIBase._generate
method calls the underlying OpenAI completions client with a list[str]
of prompts, which Together.ai doesn't support.
Just in case someone finds this issue looking for a fix, I have a workaround for the workaround. The problem with the langchain_openai
workaround can be bodged by overriding the openai.client.completions.create
method after initializing the LLM class, using the together
python library's equivalent method, and removing incompatible arguments to create, which the API doesn't support. The following is a quick example for doing this:
import together
from langchain_openai import OpenAI
from pydantic.v1.types import SecretStr
from llama_recipes.inference.prompt_format_utils import (
build_default_prompt,
create_conversation,
LlamaGuardVersion,
) # LlamaGuard 3 Prompt from https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/inference/prompt_format_utils.py
together_client = together.Together(api_key=SETTINGS.together_api_key)
llm = OpenAI(
model="meta-llama/Meta-Llama-Guard-3-8B",
api_key=SecretStr("<=== API Key goes here ===>"),
base_url="https://api.together.xyz/v1", # This may be redundant as we override the create class method anyways
max_tokens=200,
logprobs=1,
temperature=0
)
def overridden_create(prompt: list[str], **kwargs):
# Overridden openai.client.completions.create method to use the Together client, as Together doesn't support certain inputs (e.g. seed) and lists of prompts
together_allowed_keys = ["model", "prompt", "max_tokens", "stream", "stop", "temperature", "top_p", "top_k", "repetition_penalty", "logprobs", "echo", "n", "safety_model"]
kwargs = {k: v for k, v in kwargs.items() if k in together_allowed_keys}
return together_client.completions.create(prompt=prompt[0], **kwargs)
llm.client.create = overridden_create
llm_result = llm.generate([build_default_prompt("User", create_conversation(["<Sample user prompt>"]), LlamaGuardVersion["LLAMA_GUARD_3"])])
print(llm_result.json()) # {"generations": [[{"text": "safe", "generation_info": {"finish_reason": "eos", "logprobs": {"tokens": ["safe", "<|eot_id|>"], "token_logprobs": [-4.6014786e-05, -0.008911133], "token_ids": [19193, 128009]}}, "type": "Generation"}]], "llm_output": {"token_usage": {"total_tokens": 219, "completion_tokens": 2, "prompt_tokens": 217}, "model_name": "meta-llama/Meta-Llama-Guard-3-8B"}, "run": [{"run_id": "f015adc7-7558-4251-9fe6-9d11a646c173"}]}
generation = llm_result.generations[0][0]
logprobs = generation.generation_info["logprobs"] # Wow, it works!
token_usage = llm_result.llm_output["token_usage"] # Wow, we also get usage!
System Info
langchain==0.2.14 langchain-core==0.2.32 langchain-openai==0.1.21 langchain-text-splitters==0.2.2 langchain-together==0.1.5
mac (Macbook Pro M1 16GB, 2021), macOS Sonoma 14.5 (23F79)
Python 3.9.19