Langfuse tags and user_id on Haystack

Open luiscastejon-aily opened this issue 11 months ago • 1 comments

Is your feature request related to a problem? Please describe. We’ve started using Haystack to create LLM pipelines and want to track their usage with Langfuse. We use a function that returns a Haystack pipeline with the LLM and Langfuse tracer.

Currently, we rely heavily on tags in Langfuse (as we do with LangChain) to monitor usage across teams, projects, and other contexts, but there doesn’t seem to be a way to implement this functionality in Haystack. Additionally, assigning a user to a trace is a key feature we use, and we haven’t found a way to do this in Haystack either. This lack of integration limits our ability to track and manage usage effectively.

Describe the solution you’d like We’d like the ability to use langfuse tags and user_id associated to a trace in Haystack pipelines to track usage for various teams and projects. Seamless integration of these features would improve our workflow significantly.

Describe alternatives you’ve considered We’ve looked into potential workarounds, such as using the trace URL after getting the response and setting up this parameters, but this approach is cumbersome and would require to change the way we usually work with LLMs.

Jan 08 '25 13:01 luiscastejon-aily

Hey @luiscastejon-aily so this is possible with the current Langfuse Integration. It's currently not well documented which is why it was easy to miss. But the following shows how you can pass a custom trace_id, user_id and tags for a given pipeline run

from haystack_integrations.tracing.langfuse.tracer import tracing_context_var

new_ctx = current_ctx.copy()
new_ctx["trace_id"] = "custom-trace-123"
new_ctx["user_id"] = "user-456"
new_ctx["tags"] = ["production"]

# Set the new context
token = tracing_context_var.set(new_ctx)

response = pipe.run(...)

# Optionally reset the context later
tracing_context_var.reset(token)

This works for the following variables:

"trace_id"
"user_id"
"session_id"
"tags"
"version"

We could see about adding a convenience function that helps make it easier to understand what can be passed. Perhaps something like

def set_langfuse_context(trace_id=None, user_id=None, session_id=None, tags=None, version=None):
    current_ctx = tracing_context_var.get({})
    new_ctx = current_ctx.copy()
    if trace_id is not None:
        new_ctx["trace_id"] = trace_id
    if user_id is not None:
        new_ctx["user_id"] = user_id
    if session_id is not None:
        new_ctx["session_id"] = session_id
    if tags is not None:
        new_ctx["tags"] = tags
    if version is not None:
        new_ctx["version"] = version
    return tracing_context_var.set(new_ctx)

def reset_langfuse_context(token=None):
    if token:
        tracing_context_var.reset(token)
    else:
        tracing_context_var.set({})

Would this cover the use case you had in mind?

Apr 02 '25 14:04 sjrl