agents
agents copied to clipboard
[Question] Streaming chunks of text to TTS
I would like to know how I can handle chunk of text while still maintaining punctuations. I tried and am getting errors while doing this. This has caused increased latency from ~1s to close to ~4s which is quite long considering this is a conversation. Thanks this is how am working with it when adding context
async def enrich_with_rag(agent: VoicePipelineAgent, chat_ctx: llm.ChatContext):
global pdf_index
if pdf_index is None:
logger.warning("RAG system not initialized. Skipping enrichment.")
return
user_msg = chat_ctx.messages[-1]
try:
# Use LlamaIndex to query the data
query_engine = pdf_index.as_query_engine(
llm=groqLLM,
streaming=True, # Adjust model as needed
similarity_top_k=3
)
response = await asyncio.to_thread(query_engine.query, user_msg.content)
if response:
# Collect the entire streamed response
relevant_text = ""
for text in response:
relevant_text += text
logger.info(f"Enriching with RAG: {relevant_text[:100]}...") # Log first 100 chars
# Insert RAG context before the user's message
chat_ctx.messages.insert(-1, llm.ChatMessage.create(
text=f"Context:\n{relevant_text}",
role="assistant",
))
# Create a streaming LLM response
llm_stream = agent.llm.chat(chat_ctx)
return llm_stream
except Exception as e:
logger.error(f"Error during RAG enrichment: {str(e)}")```