feat(langfuse): add cost and usage support for more generators and generally for embedders
Is your feature request related to a problem? Please describe. Langfuse support costs and usage details only for generators and embeddings. The tracer.py converts some whitelisted generators to the Langfuse type "generators" which results in working usage and cost tracking. Here the whitelist:
_SUPPORTED_GENERATORS = [
"AzureOpenAIGenerator",
"OpenAIGenerator",
"AnthropicGenerator",
"HuggingFaceAPIGenerator",
"HuggingFaceLocalGenerator",
"CohereGenerator",
"OllamaGenerator",
]
_SUPPORTED_CHAT_GENERATORS = [
"AmazonBedrockChatGenerator",
"AzureOpenAIChatGenerator",
"OpenAIChatGenerator",
"AnthropicChatGenerator",
"HuggingFaceAPIChatGenerator",
"HuggingFaceLocalChatGenerator",
"CohereChatGenerator",
"OllamaChatGenerator",
"GoogleGenAIChatGenerator",
]
However generators like mistral missing. So it wont be created as generator:
elif context.component_type in _ALL_SUPPORTED_GENERATORS:
return LangfuseSpan(self.tracer.start_as_current_observation(name=context.name, as_type="generation"))
Also embedders are completly created as type "span" resulting in ignoring cost and usage.
Describe the solution you'd like Add compatible generators and embedders. I already did it for mistral models: #2463
Describe alternatives you've considered I tryed to use openinference-ai as Langfuse recommends it, but the instrumentator suffers the under the same problems. So lets solve it for Haystack!
Another working way would be to use Langfuse Python SDK to add to Langfuse some model definitions. With the help of a REGEX it will calculate the costs based on usage, which will be extracted from meta data. A Pitfall is here a custom pricing models that dont rely on tokens. E.g pages (OCR) or per request (Data API). That results also that Langfuse calculates the cost for you.
Additional context For my pipeline I need cost and usage tracking for mistral models. I solved it locally and it needs only little code changes. The challenging part for a PR is to support any type of embedders from different providers. Make sure to extract the usage from meta data and put it into the fields Langfuse expects. Be aware that Langfuse does many things in the background which can lead to problems while debugging. Double check the results.
In addition my pipelines has more components with cost and usage that I need to track. Langfuse is not having any support for this. The best solution so far was to wrongly flag anything in either generator and after this issue possibly as embedder. The intersting part is, that Langfuse has more type to offer that generator and embdders #2473 . However they drop the attribute like cost_details and usage_details. If they adjust it, tracer.py could need an update again.
@vblagoje How to proceed? What about adding MistralChatGenerator to the whitelist. And a whitelist for embedders that share a base class?
Yes sure @Hansehart for the longest time we talked about str based checks if a component is Chat|Generator so that we don't have to manually add the ChatGenerators. Also custom ChatGenerators would be supported as automatically. So before adding embedders support perhaps we can do that, wdyt? We even have an issue for that https://github.com/deepset-ai/haystack-core-integrations/issues/2050
Do you want to contribute that one first - that would register MistralChatGenerator automatically. With that fix we can immediately release and new langfuse-haystack package?
After merge of #2497 its done
Ok, let's rebase/merge and proceed with this one now @Hansehart
@vblagoje Its a honor. However I am quite busy currently. Nonetheless lets talk about it. Whats in your mind about a rebase? From my POV just special components missing cost and usage tracking like e.g external data APIs, OCR, Rerankers ... how to deal with it? I would wish Langfuse adds cost and usage tracking for spans.
🙏 @Hansehart I thought we can attempt to parse these with some heuristics if possible and if it's worth the trouble. Most likely everyone has different token format but perhaps we can do this for embedders at least?
@vblagoje I checked the Langfuse SDK - they already use heuristics in _parse_usage() to handle different formats. For embeddings specifically, they detect it by checking if usage has only 2 fields (prompt_tokens + total_tokens).
Their approach: accept any usage format, flatten/sanitize it, send to backend along with model name - Langfuse calculates costs from there.
For Haystack embedders, we can follow the same pattern as generators:
Here's what the implementation would look like in DefaultSpanHandler.handle() (after line 393):
elif component_type and component_type.endswith("Embedder"):
# Extract metadata from embedder output
output_data = span.get_data().get(_COMPONENT_OUTPUT_KEY, {})
meta = output_data.get("meta")
if meta:
# Try common provider formats using heuristics
usage = meta.get("usage") or meta.get("billed_units")
# Sanitize and flatten any nested structures
sanitized_usage = _sanitize_usage_data(usage) if usage else None
# Update span with usage details and model
span.raw_span().update(
usage_details=sanitized_usage,
model=meta.get("model")
)
Do you like this approach?
For rerankers/OCR/APIs - those are blocked by Langfuse's observation type limitations. They only support cost tracking for generation + embedding types currently. My current workaround is to declare them manuelly as generationt type.
Sounds good @Hansehart let's implement this ☝️ and then we can wind down this round of efforts until we have more clarity for OCR/APIs etc. Best to perhaps eventually piggyback on Langfuse efforts once they add these observation types/support.
@vblagoje After #2522 I am capable of this.