Would it be possible to expose the `usage` payload of the OpenAI response?
It would be really useful to track the number of tokens consumed. But the information is not bubbled up. I gather this may not be feasible across providers though?
Hi @Lawouach Do you mean surfacing the usage data from openai API responses?
This looks like
"usage": { "prompt_tokens": 5, "completion_tokens": 5, "total_tokens": 10 }
Since prompt-functions return a value corresponding to the return type annotation, this information would have to be reported through some other method.
One option would be add hooks that allow you to register functions that should be run before/after OpenaiChatModel.complete. Something like
token_usage = 0
def increment_token_usage(message: AssistantMessage):
token_usage += message.usage.total_tokens
@prompt(
"Tell me a joke",
post_completion=increment_token_usage,
)
def tell_joke():
...
Other options might be
- Add a context manager that tracks token usage for prompt-functions called within it. Like in LangChain https://python.langchain.com/docs/modules/model_io/llms/token_usage_tracking
It seems like adding usage to the AssistantMessage class is necessary/useful in general. And if that were present you could add the hooks by subclassing OpenaiChatModel, modifying .complete to update the token counter, and then passing this class as the model to @prompt. I would support this approach for the moment until there's more use cases to justify adding some more complex solution.
Hi @jackmpcollins that would be enough of a solution for my use case indeed. I think it would generalize well too.
API stats are not being returned by the OpenAI API when streaming responses (which magentic does for all responses under the hood).
Javascript package issue comment suggests this is coming soon: https://github.com/openai/openai-node/issues/506#issuecomment-1857289838
Developer community post requesting this: https://community.openai.com/t/openai-api-get-usage-tokens-in-response-when-set-stream-true/141866?u=jackmpcollins
Corresponding openai python client issue is https://github.com/openai/openai-python/issues/1053
Yai they seem to have shipped it.
@Lawouach I've published a prerelease to test having a .usage attribute on AssistantMessage. Could you test it out and let me know if it works for your use case please. One thing to note is that usage only becomes available (not None) once the streamed response has reached the end. This happens before return for most types, but for streamed types like StreamedStr and Iterable it happens after these have been fully iterated over.
pip install "magentic==0.25.0a0"
I have some notes on the PR https://github.com/jackmpcollins/magentic/pull/214
For the solution above, to create a wrapper ChatModel that does something with usage your code would something like below. You could pass this model as the model argument to @prompt etc.
from typing import Any, Callable, Iterable, TypeVar
from magentic import AssistantMessage, OpenaiChatModel, UserMessage
from magentic.chat_model.base import ChatModel
from magentic.chat_model.message import Message
R = TypeVar("R")
class LoggingChatModel(ChatModel):
def __init__(self, chat_model: ChatModel):
self.chat_model = chat_model
def complete(
self,
messages: Iterable[Message[Any]],
functions: Iterable[Callable[..., Any]] | None = None,
output_types: Iterable[type[R]] | None = None,
*,
stop: list[str] | None = None,
) -> AssistantMessage[str] | AssistantMessage[R]:
response = self.chat_model.complete(
messages=messages,
functions=functions,
output_types=output_types,
stop=stop,
)
print("usage:", response.usage) # "Logging"
return response
async def acomplete(): pass # Bypass ABC error
chat_model = LoggingChatModel(OpenaiChatModel("gpt-3.5-turbo", seed=42))
message = chat_model.complete(messages=[UserMessage("Say hello!")])
# > usage: Usage(input_tokens=10, output_tokens=9)
print(message.content)
# > Hello! How can I assist you today?
@Lawouach This is released now in https://github.com/jackmpcollins/magentic/releases/tag/v0.26.0 Please let me know how it works for you.