agenta [Enhancement]: observability integration without using agenta hosted apps

Description

This PR proposes integrating observability into LLM applications with Agenta, even if the applications are not hosted on Agenta.

Related Issue

Closes cloud_#338 Relative commons_#43

Additional Information

A pre-alpha version of the SDK has been deployed to pypi for testing. You can check the obs-app on cloud.beta and review the observability traces created with the example in this PR.

Apr 26 '24 12:04 aybruhm

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
agenta	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 30, 2024 7:37pm

Apr 26 '24 12:04 vercel[bot]

The question I am asking are:

How can we simplify instrumentation to the max. Remove anything that is not required.

ag.init()

Not really required for the instrumentation per se. We can instead use the environment variables for initializing the tracing object in case the agenta singleton does not exist (or the input variables)

llm_tracing We can keep this, as a way to use tracing with ag.init(), but for the pure observability use case, we can simply use the Tracing constructor and make sure that we use the environment variables if nothing is provided

@ag.entrypoint → I think we should have a different decorator that only implements initializing the tracing object, and calling start_parent_span. Basically has only one responsability. (ag.trace )

@ag.span Nothing changes there

Thanks for the review, @mmabrouk. I thought about this last night, and I created a POC to test how the implementation would work and it did.

Observability would still work if we:

remove ag.init and use env vars instead
construct the Tracing class directly and make use of the env vars

Regarding having ag.trace, here's how it would be:

@ag.trace # ag.entrypoint will be added after ag.trace in the case where the user wants to deploy their LLM app to agenta
async def generate(country: str, gender: str):
    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

May 01 '24 06:05 aybruhm

In case someone has two spans in the same code using two different apps

# set the env vars
os.environ["AGENTA_API_KEY"] = xxx

@ag.span(type="LLM")
async def llm_call(prompt):
    chat_completion = await client.chat.completions.create(
        model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
    )
    tracing.set_span_attribute(
        "model_config", {"model": "gpt-3.5-turbo", "temperature": ag.config.temperature}
    )  # translate to {"model_config": {"model": "gpt-3.5-turbo", "temperature": 0.2}}
    tokens_usage = chat_completion.usage.dict()
    return {
        "cost": ag.calculate_token_usage("gpt-3.5-turbo", tokens_usage),
        "message": chat_completion.choices[0].message.content,
        "usage": tokens_usage,
    }

@ag.span(app_id="")
async def generate(country: str, gender: str):
    """
    Generate a baby name based on the given country and gender.

    Args:
        country (str): The country to generate the name from.
        gender (str): The gender of the baby.
    """

    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

@ag.span(app_id="")
async def somepotherlogic(country: str, gender: str):
    prompt = ag.config.prompt_template.format(country=country, gender=gender)
    response = await llm_call(prompt=prompt)
    return {
        "message": response["message"],
        "usage": response["usage"],
        "cost": response["cost"],
    }

The issue in the second example is that we have currently one singleton which we are using. So maybe this use case is not feasible now?

Love to hear your thoughts

The use case is. We just need to revise our implementation of Tracing to support multiple singletons if we want to have different spans for different applications.

The behaviour of the Tracing would not change, only thing we would need to do is modify the Tracing constructor to support multiple singletons, and also introduce a mechanism that would switch between different singletons based on the provided app_id.

May 01 '24 06:05 aybruhm

Thanks a lot @aybruhm great work!

I still have work to finish the review. But I like a lot the current state (I reviewed all the code, but still have some notes about the usability I need to go through).

I thought I'd share some of my early comments:

It would be great to have examples for the different workflows in a simple file, we can later have these as a documentation. The workflows are someone using only tracing from bash, using only tracing from their own hosted stuff, using tracing + entrypoint from bash, using tracing+entrypoint in agenta hosted. How does the code look like for each, how do they set the variables in each case (variant_id, environment...).

I don't see the @trace decorator we have discussed, the one which would be used in case 1 and 2

I think it makes sense to create unit tests (with mocks) for the sdk, the logic is complicated and we need to make sure it works. The first tests could be just about the information whether it's being set correctly, some about the span/trace and how they work together, others about the singleton logic (whether you can access the span in multiple places).

Thanks again for the great work

Thank you for the comments, @mmabrouk. I'd proceed to work on what's missing.

May 09 '24 08:05 aybruhm

Thanks a lot @aybruhm great work!

I still have work to finish the review. But I like a lot the current state (I reviewed all the code, but still have some notes about the usability I need to go through).

I thought I'd share some of my early comments:

It would be great to have examples for the different workflows in a simple file, we can later have these as a documentation. The workflows are someone using only tracing from bash, using only tracing from their own hosted stuff, using tracing + entrypoint from bash, using tracing+entrypoint in agenta hosted. How does the code look like for each, how do they set the variables in each case (variant_id, environment...).

I don't see the @trace decorator we have discussed, the one which would be used in case 1 and 2

I think it makes sense to create unit tests (with mocks) for the sdk, the logic is complicated and we need to make sure it works. The first tests could be just about the information whether it's being set correctly, some about the span/trace and how they work together, others about the singleton logic (whether you can access the span in multiple places).

Thanks again for the great work

Hi @mmabrouk,

I have worked on the first and second comments. Kindly check the workflows folder within examples/app_with_observability in this PR for the different use cases of using Agenta that we discussed in Slack.

May 09 '24 20:05 aybruhm

agenta agenta copied to clipboard

[Enhancement]: observability integration without using agenta hosted apps

Description

Related Issue

Additional Information

agenta
agenta copied to clipboard