[bug] Beeai - missing retriever span
Describe the bug When running a simple RAG agent, retriever spans are not captured.
To Reproduce
import asyncio
import sys
import traceback
from beeai_framework.adapters.beeai.backend.vector_store import TemporalVectorStore
from beeai_framework.adapters.langchain.mappers.documents import lc_document_to_document
from beeai_framework.backend.embedding import EmbeddingModel
from beeai_framework.backend.vector_store import VectorStore
from beeai_framework.errors import FrameworkError
from beeai_framework.agents.experimental.rag import RAGAgent, RagAgentRunInput
from beeai_framework.adapters.openai import OpenAIChatModel
from beeai_framework.memory import UnconstrainedMemory
import os
from dotenv import load_dotenv
from arize.otel import register
from openinference.instrumentation.beeai import BeeAIInstrumentor
from beeai_framework.backend import UserMessage
load_dotenv()
arize_space_id = os.getenv("ARIZE_SPACE_ID")
arize_api_key = os.getenv("ARIZE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
tracer_provider = register(
space_id=arize_space_id,
api_key=arize_api_key,
project_name="beeai-cookbook",
)
BeeAIInstrumentor().instrument(tracer_provider=tracer_provider)
# LC dependencies - to be swapped with BAI dependencies
try:
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
except ModuleNotFoundError as e:
raise ModuleNotFoundError(
"Optional modules are not found.\nRun 'pip install \"beeai-framework[rag]\"' to install."
) from e
async def main() -> None:
embedding_model = EmbeddingModel.from_name("openai:text-embedding-ada-002", truncate_input_tokens=500)
# Document loading
loader = UnstructuredMarkdownLoader(file_path="docs/modules/agents.mdx")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=1000)
all_splits = text_splitter.split_documents(docs)
documents = [lc_document_to_document(document) for document in all_splits]
print(f"Loaded {len(documents)} documents")
vector_store: TemporalVectorStore = VectorStore.from_name(
name="beeai:TemporalVectorStore", embedding_model=embedding_model
) # type: ignore[assignment]
_ = await vector_store.add_documents(documents=documents)
llm = OpenAIChatModel(model="gpt-4o-mini")
agent = RAGAgent(llm=llm, memory=UnconstrainedMemory(), vector_store=vector_store)
response = await agent.run(RagAgentRunInput(message=UserMessage("What agents are available in BeeAI?")))
print(response.message.text)
if __name__ == "__main__":
try:
asyncio.run(main())
except FrameworkError as e:
traceback.print_exc()
sys.exit(e.explain())
Expected behavior Retrieval should be instrumented
Additional context Add any other context about the problem here.
The BeeAIInstrumentor currently does not instrument retriever spans—only agents, chat models, and tools are traced automatically. To capture retriever spans (for example, when your RAG agent queries the vector store), you’ll need to add manual instrumentation. The recommended approach is to use Phoenix decorators like @tracer.chain or tracer.start_as_current_span around your retrieval logic, and set the span kind to RETRIEVER using OpenInference semantic conventions. When adding retrieval documents to spans, flatten the attributes (e.g., retrieval.documents.{i}.document.id, retrieval.documents.{i}.document.content) for best compatibility with the UI [details & example].
For a full example of manual instrumentation in a RAG workflow—including retrieval, chunking, prompt composition, and LLM calls—see the tracing_and_evals_weaviate.ipynb notebook. Manual instrumentation provides better UI integration and control than generic OpenTelemetry packages [discussion].
If you need a template for instrumenting retrieval, let me know what your retrieval code looks like and I can help you adapt it!
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
Good observation. I will add support for them soon.