mirascope
mirascope copied to clipboard
Track costs for streaming with Cohere
Is your feature request related to a problem? Please describe. Many providers are starting to add usage to streaming. This makes it much easier for Mirascope to calculate cost.
Describe the solution you'd like
Add a total_cost
property to CohereCallResponseChunk
. Read the "event_type": "stream-end"
sent by Cohere API and calculate cost using
"token_count": {
"prompt_tokens": ...,
"response_tokens": ...,
"total_tokens": ...,
"billed_tokens": ...,
}
Update https://github.com/Mirascope/mirascope/blob/dev/mirascope/cohere/utils.py as necessary.
See #214 since these are related.
Namely: https://github.com/Mirascope/mirascope/issues/214#issuecomment-2098893697
Is your feature request related to a problem? Please describe. Many providers are starting to add usage to streaming. This makes it much easier for Mirascope to calculate cost.
Describe the solution you'd like Add a
total_cost
property toCohereCallResponseChunk
. Read the"event_type": "stream-end"
sent by Cohere API and calculate cost using"token_count": { "prompt_tokens": ..., "response_tokens": ..., "total_tokens": ..., "billed_tokens": ..., }
Update https://github.com/Mirascope/mirascope/blob/dev/mirascope/cohere/utils.py as necessary.
I am working on this but the problem I am facing is that event
returned by co.chat_stream()
is of type StreamedChatResponse
and it's response
property is of type NonStreamedChatResponse
which does not have token_count
property in it. I am not sure how do I access the token_count
here.
Doesn't the NonStreamedChatResponse
type have response.meta.billed_units
, which return ApiMetaBilledUnits
from which we should be able to grab the same usage statistics that we do for the normal response? We can likely massage that data into the form we need to calculate cost, right?
Doesn't the
NonStreamedChatResponse
type haveresponse.meta.billed_units
, which returnApiMetaBilledUnits
from which we should be able to grab the same usage statistics that we do for the normal response? We can likely massage that data into the form we need to calculate cost, right?
Yes, it does, but according to the API docs, the streamed response has no meta.billed_units
property. I does have token_count
though. I can look again at what is happening on the API side and update here.
This is partially implemented with #307 where Cohere chunks will contain input_tokens and output_tokens which can be used to calculate cost. The only thing remaining that will need to be done is to pass cost
into CohereCallResponseChunk
.