burr [Question] Docs comparing Burr to other frameworks?

(This is a bit of a cross post with https://github.com/BrainBlend-AI/atomic-agents/issues/140.)

We are looking for an agentic and probably graph-based framework for our AI assistant chatbot.

So far we have developed with instructor alone, but we're exploring frameworks to manage some concerns for us including (so we think) state management (i.e. a single, central API used throughout our code to read and write data that constitutes the "current context" of the conversation being had) and more importantly orchestration (i.e. based on the conversation so far, what should the next bot response be?).

For example, our chat flow starts out like this:

User sends a message to our assistant
We attempt to classify (via LLM call) their message and assign a "conversation type"
If the user's message is classified with at least 1 category/type, and the LLM reports a confidence of at least 90%, we proceed to another flow that we call "extraction" whereby we extract semantic information from the user's text, and ask follow up questions for anything we are missing to be able to answer their question for the given conversation type.
If the user's message is NOT classified, or the classification confidence score is < 90%, we want our bot to ask the user to clarify or expound upon their initial message.

In a perfect world, we'd like to simply encapsulate this logic in Pydantic models, throw the structured input/output schemas at the LLM, and have it all Just Work ™! atomic-agents emphasizes structured inputs and outputs as its hallmark selling point, so this appeals to us, but it doesn't seem to do much by way of state management, and I think we need that concept in order to have the state machine-like behavior emphasized by burr.

Both this project, burr, and atomic-agents are currently our top 2 candidates. langgraph is probably the best-known incumbent so we're considering that, too. pydantic-ai also seems to have a fair amount of overlap... and by now you can see our heads are spinning!

burr seems more focused on managing statefulness, and building your application lifecycle around state to define transitions among nodes in your state machine's graph.

atomic-agents seems to exclusively emphasize structured inputs and outputs and encapsulating those along w/ any configuration into "tools". So it is basically a wrapper on top of instructor and pydantic.

Is anyone aware of a blog post or some other publicly available source that compares these agentic frameworks?

Do any obvious pros/cons come to mind? If so, please share - I'm sure many people are in a similar position of evaluating these framework options.

Jun 26 '25 18:06 mecampbellsoup

Interestingly, pydantic-ai seems to warn pretty loudly about not reaching for a graph if you don't absolutely need one: https://ai.pydantic.dev/graph/

Jun 26 '25 19:06 mecampbellsoup

@mecampbellsoup thanks for the issue.

Yeah we don't have anything like that written up yet! Would you like to contribute?

Short story for Burr is that:

It's the only Apache governed project -- so you can invest and become part of the company and help drive direction, versus being beholden to VC dollars pushing growth and then at some point monetization...
To my knowledge, the Burr UI is the only Open Source UI that builds specific observability tooling to help you observe and inspect both LLMs and state.
Burr's design philosophy is to ensure that productionization isnt' going to be painful with customization (e.g. langraph forces you to bring in langchain for any customization) and not getting in the way of future iteration is what we focus on (see this blog) -- e.g. this manifests as you have to bring the LLM calls, Burr does not do that on your behalf, so you always can know what the prompt is that you sent to the LLM because it's up to you to control; we think that's ultimately the best thing to do.

User sends a message to our assistant We attempt to classify (via LLM call) their message and assign a "conversation type" If the user's message is classified with at least 1 category/type, and the LLM reports a confidence of at least 90%, we proceed to another flow that we call "extraction" whereby we extract semantic information from the user's text, and ask follow up questions for anything we are missing to be able to answer their question for the given conversation type. If the user's message is NOT classified, or the classification confidence score is < 90%, we want our bot to ask the user to clarify or expound upon their initial message.

What I counsel people on is to do a bake off. Write this code vanilla -- frameworks should really just be syntactic sugar over vanilla options. Be sure to include observability and logging concerns in your bake-off, since POC code doesn't look like production code without it. It might very well be that you haven't graduated to a complex enough problem if all that you have is some if else statements ...

Does that help any?

Jun 30 '25 00:06 skrawcz

Hello @skrawcz It seems the company that created Burr was acquired by Salesforce. Is Burr Cloud a Salesforce/Agentforce offering? Do most of the committers or PMC members work at Salesforce?

Jul 01 '25 06:07 mingdaoy

Hello @skrawcz It seems the company that created Burr was acquired by Salesforce. Is Burr Cloud a Salesforce/Agentforce offering? Do most of the committers or PMC members work at Salesforce?

Burr Cloud offering was shut down.

Most of PMC don't work at Salesforce actually.

Jul 01 '25 06:07 skrawcz