magentic Would it be possible to expose the `usage` payload of the OpenAI response?

It would be really useful to track the number of tokens consumed. But the information is not bubbled up. I gather this may not be feasible across providers though?

Dec 05 '23 14:12 Lawouach

Hi @Lawouach Do you mean surfacing the usage data from openai API responses?

This looks like

"usage": { "prompt_tokens": 5, "completion_tokens": 5, "total_tokens": 10 }

Since prompt-functions return a value corresponding to the return type annotation, this information would have to be reported through some other method.

One option would be add hooks that allow you to register functions that should be run before/after OpenaiChatModel.complete. Something like

token_usage = 0


def increment_token_usage(message: AssistantMessage):
    token_usage += message.usage.total_tokens


@prompt(
    "Tell me a joke",
    post_completion=increment_token_usage,
)
def tell_joke():
    ...

Other options might be

Add a context manager that tracks token usage for prompt-functions called within it. Like in LangChain https://python.langchain.com/docs/modules/model_io/llms/token_usage_tracking

It seems like adding usage to the AssistantMessage class is necessary/useful in general. And if that were present you could add the hooks by subclassing OpenaiChatModel, modifying .complete to update the token counter, and then passing this class as the model to @prompt. I would support this approach for the moment until there's more use cases to justify adding some more complex solution.

Dec 06 '23 08:12 jackmpcollins

Hi @jackmpcollins that would be enough of a solution for my use case indeed. I think it would generalize well too.

Dec 06 '23 08:12 Lawouach

API stats are not being returned by the OpenAI API when streaming responses (which magentic does for all responses under the hood).

Javascript package issue comment suggests this is coming soon: https://github.com/openai/openai-node/issues/506#issuecomment-1857289838

Developer community post requesting this: https://community.openai.com/t/openai-api-get-usage-tokens-in-response-when-set-stream-true/141866?u=jackmpcollins

Feb 18 '24 05:02 jackmpcollins

Corresponding openai python client issue is https://github.com/openai/openai-python/issues/1053

Apr 21 '24 23:04 jackmpcollins

Yai they seem to have shipped it.

May 07 '24 11:05 Lawouach

@Lawouach I've published a prerelease to test having a .usage attribute on AssistantMessage. Could you test it out and let me know if it works for your use case please. One thing to note is that usage only becomes available (not None) once the streamed response has reached the end. This happens before return for most types, but for streamed types like StreamedStr and Iterable it happens after these have been fully iterated over.

pip install "magentic==0.25.0a0"

I have some notes on the PR https://github.com/jackmpcollins/magentic/pull/214

For the solution above, to create a wrapper ChatModel that does something with usage your code would something like below. You could pass this model as the model argument to @prompt etc.

from typing import Any, Callable, Iterable, TypeVar

from magentic import AssistantMessage, OpenaiChatModel, UserMessage
from magentic.chat_model.base import ChatModel
from magentic.chat_model.message import Message


R = TypeVar("R")


class LoggingChatModel(ChatModel):
    def __init__(self, chat_model: ChatModel):
        self.chat_model = chat_model

    def complete(
        self,
        messages: Iterable[Message[Any]],
        functions: Iterable[Callable[..., Any]] | None = None,
        output_types: Iterable[type[R]] | None = None,
        *,
        stop: list[str] | None = None,
    ) -> AssistantMessage[str] | AssistantMessage[R]:
        response = self.chat_model.complete(
            messages=messages,
            functions=functions,
            output_types=output_types,
            stop=stop,
        )
        print("usage:", response.usage)  # "Logging"
        return response

    async def acomplete(): pass  # Bypass ABC error


chat_model = LoggingChatModel(OpenaiChatModel("gpt-3.5-turbo", seed=42))
message = chat_model.complete(messages=[UserMessage("Say hello!")])
# > usage: Usage(input_tokens=10, output_tokens=9)
print(message.content)
# > Hello! How can I assist you today?

May 16 '24 08:05 jackmpcollins

@Lawouach This is released now in https://github.com/jackmpcollins/magentic/releases/tag/v0.26.0 Please let me know how it works for you.

May 27 '24 06:05 jackmpcollins