paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

Pinpoint raw evidence chunks

Open athena-yao opened this issue 1 year ago • 1 comments

I want to be able to pinpoint the exact sentence(s) in a document where data was extracted from so that I can cross-check for accuracy. answer.context() gives summarized chunks (after they've been fed into the summary LLM) but not the raw data. I've also tried implementing await docs.aget_evidence and gather_evidence without success. How can I get the raw evidence chunks associated with answers?

athena-yao avatar Sep 23 '24 19:09 athena-yao

Hey @athena-yao! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

To get the raw evidence chunks associated with answers in a document, you can use the aget_evidence method directly. This method retrieves the relevant texts based on the query without summarizing them. Here is an example of how you can use it:

async def get_raw_evidence(docs: Docs, query: str, settings: Settings, embedding_model: EmbeddingModel, summary_llm_model: LiteLLMModel) -> list[Text]:
    state = EnvironmentState(docs=docs, answer=Answer(question=query))
    gather_evidence_tool = GatherEvidence(settings=settings, embedding_model=embedding_model, summary_llm_model=summary_llm_model)
    await gather_evidence_tool.gather_evidence(query, state)
    return [context.context for context in state.answer.contexts]

# Usage
docs = Docs()  # Initialize your Docs object
query = "Your question here"
settings = Settings()  # Initialize your Settings object
embedding_model = EmbeddingModel()  # Initialize your EmbeddingModel object
summary_llm_model = LiteLLMModel()  # Initialize your LiteLLMModel object
raw_evidence_chunks = await get_raw_evidence(docs, query, settings, embedding_model, summary_llm_model)

This will give you the raw evidence chunks without any summarization [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 23 '24 19:09 dosubot[bot]