openinference icon indicating copy to clipboard operation
openinference copied to clipboard

[bug] Beeai - missing retriever span

Open annguyenarize opened this issue 6 months ago • 2 comments

Describe the bug When running a simple RAG agent, retriever spans are not captured.

To Reproduce

import asyncio
import sys
import traceback

from beeai_framework.adapters.beeai.backend.vector_store import TemporalVectorStore
from beeai_framework.adapters.langchain.mappers.documents import lc_document_to_document
from beeai_framework.backend.embedding import EmbeddingModel
from beeai_framework.backend.vector_store import VectorStore
from beeai_framework.errors import FrameworkError

from beeai_framework.agents.experimental.rag import RAGAgent, RagAgentRunInput
from beeai_framework.adapters.openai import OpenAIChatModel
from beeai_framework.memory import UnconstrainedMemory

import os
from dotenv import load_dotenv
from arize.otel import register
from openinference.instrumentation.beeai import BeeAIInstrumentor
from beeai_framework.backend import UserMessage

load_dotenv()

arize_space_id = os.getenv("ARIZE_SPACE_ID")
arize_api_key = os.getenv("ARIZE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

tracer_provider = register(
    space_id=arize_space_id,
    api_key=arize_api_key,
    project_name="beeai-cookbook",
)

BeeAIInstrumentor().instrument(tracer_provider=tracer_provider)

# LC dependencies - to be swapped with BAI dependencies
try:
    from langchain_community.document_loaders import UnstructuredMarkdownLoader
    from langchain_text_splitters import RecursiveCharacterTextSplitter
except ModuleNotFoundError as e:
    raise ModuleNotFoundError(
        "Optional modules are not found.\nRun 'pip install \"beeai-framework[rag]\"' to install."
    ) from e


async def main() -> None:
    embedding_model = EmbeddingModel.from_name("openai:text-embedding-ada-002", truncate_input_tokens=500)

    # Document loading
    loader = UnstructuredMarkdownLoader(file_path="docs/modules/agents.mdx")
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=1000)
    all_splits = text_splitter.split_documents(docs)
    documents = [lc_document_to_document(document) for document in all_splits]
    print(f"Loaded {len(documents)} documents")

    vector_store: TemporalVectorStore = VectorStore.from_name(
        name="beeai:TemporalVectorStore", embedding_model=embedding_model
    )  # type: ignore[assignment]
    _ = await vector_store.add_documents(documents=documents)
    
    llm = OpenAIChatModel(model="gpt-4o-mini")

    agent = RAGAgent(llm=llm, memory=UnconstrainedMemory(), vector_store=vector_store)
    response = await agent.run(RagAgentRunInput(message=UserMessage("What agents are available in BeeAI?")))
    print(response.message.text)


if __name__ == "__main__":
    try:
        asyncio.run(main())
    except FrameworkError as e:
        traceback.print_exc()
        sys.exit(e.explain())

Expected behavior Retrieval should be instrumented

Additional context Add any other context about the problem here.

annguyenarize avatar Aug 28 '25 15:08 annguyenarize

The BeeAIInstrumentor currently does not instrument retriever spans—only agents, chat models, and tools are traced automatically. To capture retriever spans (for example, when your RAG agent queries the vector store), you’ll need to add manual instrumentation. The recommended approach is to use Phoenix decorators like @tracer.chain or tracer.start_as_current_span around your retrieval logic, and set the span kind to RETRIEVER using OpenInference semantic conventions. When adding retrieval documents to spans, flatten the attributes (e.g., retrieval.documents.{i}.document.id, retrieval.documents.{i}.document.content) for best compatibility with the UI [details & example].

For a full example of manual instrumentation in a RAG workflow—including retrieval, chunking, prompt composition, and LLM calls—see the tracing_and_evals_weaviate.ipynb notebook. Manual instrumentation provides better UI integration and control than generic OpenTelemetry packages [discussion].

If you need a template for instrumenting retrieval, let me know what your retrieval code looks like and I can help you adapt it!

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Aug 28 '25 16:08 dosubot[bot]

Good observation. I will add support for them soon.

Tomas2D avatar Aug 29 '25 09:08 Tomas2D