kotaemon
kotaemon copied to clipboard
[Feature] add structured output to openai
Description
- Adds a StructuredOutputChatOpenAI class to enable downstream applications to consume json
Simple example usage
import json
from kotaemon.llms import StructuredOutputChatOpenAI
class StructuredAnswer(BaseModel):
answer: str
structured_llm = StructuredOutputChatOpenAI(
base_url='https://api.openai.com/v1',
model = 'gpt-4o-mini',
temperature= 1,
api_key = os.environ.get('OPENAI_API_KEY'),
response_schema=StructuredAnswer
)
answer = await structured_llm.ainvoke('Hello how are you?')
print(json.loads(answer.content))
# -> {'answer': "I'm just a computer program, but I'm here and ready to help you! How can I assist you today?"}
Example usage in a retrieval pipeline
from kotaemon.storages.docstores import LanceDBDocumentStore
from kotaemon.storages.vectorstores import ChromaVectorStore
from kotaemon.embeddings.openai import OpenAIEmbeddings
from ktem.ktem.index.file.pipelines import DocumentRetrievalPipeline
from kotaemon.indices.qa.format_context import PrepareEvidencePipeline
from kotaemon.indices.qa.citation_qa import AnswerWithContextPipeline
from kotaemon.llms.chats.openai import StructuredOutputChatOpenAI, ChatOpenAI
from ktem.ktem.reasoning.simple import FullQAPipeline
from kotaemon.indices.rankings import LLMTrulensScoring
app_dir = "<path to your app data>/kotaemon/ktem_app_data/"
user_data_dir = app_dir + "user_data/"
doc_store_dir = user_data_dir + "docstore/"
doc_store = LanceDBDocumentStore(path = doc_store_dir, collection_name="index_1")
# vector store stuff
vector_store_dir = user_data_dir + "vectorstore"
vector_store = ChromaVectorStore(path = vector_store_dir, collection_name="index_1")
llm = ChatOpenAI(
base_url='https://api.openai.com/v1',
model = 'gpt-4o-mini',
temperature= 0,
api_key = os.environ.get('OPENAI_API_KEY'),
)
llm_scorer = LLMTrulensScoring( llm = llm )
#embeddings
embedding = OpenAIEmbeddings(
base_url='https://api.openai.com/v1',
model = 'text-embedding-ada-002',
api_key=os.environ.get('OPENAI_API_KEY'),
context_length=8191)
# document retrieval pipeline
document_retrieval = DocumentRetrievalPipeline(
embedding = embedding,
retrieval_mode = 'vector', # can be vector or text
vector_store = vector_store,
doc_store = doc_store,
top_k=5,
rerankers=[], #can provide rerankers
llm_scorer = llm_scorer
# rerankers = [cohere_reranking]
)
# pipeline that formats retrieved content
evidence_pipeline = PrepareEvidencePipeline()
class StructuredAnswer(BaseModel):
answer: str
structured_llm = StructuredOutputChatOpenAI(
base_url='https://api.openai.com/v1',
model = 'gpt-4o-mini',
temperature= 1,
api_key = os.environ.get('OPENAI_API_KEY'),
response_schema=StructuredAnswer
)
# answer questions with provided evidence
answer_pipeline = AnswerWithContextPipeline(
llm=structured_llm,
qa_template= (
"Context: \n{context}\n\n"
"{question}\n"
)
)
qa_pipeline = FullQAPipeline(
retrievers=[document_retrieval],
evidence_pipeline=evidence_pipeline,
answering_pipeline=answer_pipeline
)
prompt = 'This is a prompt'
# fetch relevant document ids and implement invoke method
answer, scored_docs = qa_pipeline.invoke(prompt, document_ids=[])
parsed_answer = json.loads(answer.content)
Type of change
- [x] New features (non-breaking change).
- [ ] Bug fix (non-breaking change).
- [ ] Breaking change (fix or feature that would cause existing functionality not to work as expected).
Checklist
- [x] I have performed a self-review of my code.
- [ ] I have added thorough tests if it is a core feature.
- [x] There is a reference to the original bug report and related work.
- [x] I have commented on my code, particularly in hard-to-understand areas.
- [x] The feature is well documented.