paper-qa EmptyDocsError: Not gathering evidence due to having no papers.

Why show ”EmptyDocsError: Not gathering evidence due to having no papers.“ It can not detect my paper?

Sep 24 '24 02:09 SURUIYUAN

The error "EmptyDocsError: Not gathering evidence due to having no papers." appears because the gather_evidence method in the GatherEvidence class checks if there are any documents present in the state.docs.docs. If no documents are found, it raises the EmptyDocsError. This indicates that no papers have been detected or added to the state before calling the gather_evidence method ^[1].

To resolve this, ensure that papers are correctly added to state.docs.docs using methods like aadd_texts in the Docs class. This can be done through the paper_search function or by directly adding documents to the Docs object ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Sep 24 '24 02:09 dosubot[bot]

How to modify in python program, explain in detail.

Sep 24 '24 02:09 SURUIYUAN

To ensure that papers are correctly added to state.docs.docs and avoid the "EmptyDocsError," you need to verify and possibly modify the paper_search method in the PaperSearch class. Here is a detailed explanation of the necessary steps:

Ensure paper_search Method Updates state.docs.docs: The paper_search method should correctly update the state.docs.docs attribute by adding documents using the aadd_texts method. Here is the relevant part of the code:

async def paper_search(
    self,
    query: str,
    min_year: int | None,
    max_year: int | None,
    state: EnvironmentState,
) -> str:
    # ... (other code)

    # combine all the resulting doc objects into one and update the state
    all_docs = []
    for r in results:
        this_doc = next(iter(r.docs.values()))
        all_docs.append(this_doc)
        await state.docs.aadd_texts(
            texts=r.texts,
            doc=this_doc,
            settings=self.settings,
            embedding_model=self.embedding_model,
        )

    status = state.status
    logger.info(status)
    # mark how far we've searched so that continuation will start at the right place
    self.previous_searches[search_key] += self.settings.agent.search_count
    if self.settings.agent.return_paper_metadata:
        retrieved_papers = "\n".join([f"{x.title} ({x.year})" for x in all_docs])
        return f"Retrieved Papers:\n{retrieved_papers}\n\n{status}"
    return status

Verify results from index.query: Ensure that the results from the index.query call are not empty. If results is empty, no documents will be added to state.docs.docs, leading to the "EmptyDocsError".
Check aadd_texts Method: The aadd_texts method in the Docs class should correctly add texts to the docs dictionary. Here is the method signature and its parameters:
```
async def aadd_texts(
    self,
    texts: list[Text],
    doc: Doc,
    settings: MaybeSettings = None,
    embedding_model: EmbeddingModel | None = None,
) -> bool:
```
Ensure that this method is functioning correctly and that the texts and doc parameters are properly passed.

Ensure Docs Class is Correctly Implemented: The Docs class should manage the docs dictionary and other related attributes properly. Here is a snippet of the Docs class for reference:

class Docs(BaseModel):
    """A collection of documents to be used for answering questions."""

    model_config = ConfigDict(extra="forbid")

    id: UUID = Field(default_factory=uuid4)
    docs: dict[DocKey, Doc | DocDetails] = Field(default_factory=dict)
    texts: list[Text] = Field(default_factory=list)
    docnames: set[str] = Field(default_factory=set)
    texts_index: VectorStore = Field(default_factory=NumpyVectorStore)
    name: str = Field(default="default", description="Name of this docs collection")
    index_path: Path | None = Field(
        default=PAPERQA_DIR, description="Path to save index", validate_default=True
    )
    deleted_dockeys: set[DocKey] = Field(default_factory=set)

    @field_validator("index_path")
    @classmethod
    def handle_default(cls, value: Path | None, info: ValidationInfo) -> Path | None:
        if value == PAPERQA_DIR:
            return PAPERQA_DIR / info.data["name"]
        return value

    def clear_docs(self) -> None:
        self.texts = []
        self.docs = {}
        self.docnames = set()

    def _get_unique_name(self, docname: str) -> str:
        """Create a unique name given proposed name."""
        suffix = ""
        while (docname + suffix) in self.docnames:
            suffix = "a" if suffix == "" else chr(ord(suffix) + 1)
        docname += suffix
        return docname

    async def aadd_file(
        self,
        file: BinaryIO,
        citation: str | None = None,
        docname: str | None = None,
        dockey: DocKey | None = None,
        title: str | None = None,
        doi: str | None = None,
        authors: list[str] | None = None,
        settings: MaybeSettings = None,
        llm_model: LLMModel | None = None,
        embedding_model: EmbeddingModel | None = None,
        **kwargs,
    ) -> str | None:
        """Add a document to the collection."""
        # just put in temp file and use existing method
        suffix = ".txt"
        if maybe_is_pdf(file):
            suffix = ".pdf"
        elif maybe_is_html(file):
            suffix = ".html"

        with tempfile.NamedTemporaryFile(suffix=suffix) as f:
            f.write(file.read())
            f.seek(0)
            return await self.aadd(
                Path(f.name),
                citation=citation,
                docname=docname,
                dockey=dockey,
                title=title,
                doi=doi,
                authors=authors,
                settings=settings,
                llm_model=llm_model,
                embedding_model=embedding_model,
                **kwargs,
            )

    async def aadd(
        self,
        path: Path,
        citation: str | None = None,
        docname: str | None = None,
        dockey: DocKey | None = None,
        title: str | None = None,
        doi: str | None = None,
        authors: list[str] | None = None,
        settings: MaybeSettings = None,
        llm_model: LLMModel | None = None,
        embedding_model: EmbeddingModel | None = None,
        **kwargs,
    ) -> str | None:
        """Add a document to the collection."""
        all_settings = get_settings(settings)
        parse_config = all_settings.parsing
        if dockey is None:
            dockey = md5sum(path)
        if llm_model is None:
            llm_model = all_settings.get_llm()
        if citation is None:
            texts = read_doc(
                path,
                Doc(docname="", citation="", dockey=dockey),  # Fake doc
                chunk_chars=parse_config.chunk_size,
                overlap=parse_config.overlap,
            )
            if not texts:
                raise ValueError(f"Could not read document {path}. Is it empty?")
            result = await llm_model.run_prompt(
                prompt=parse_config.citation_prompt,
                data={"text": texts[0].text},
                skip_system=True,
            )
            citation = result.text
            if (
                len(citation) < 3
                or "Unknown" in citation
                or "insufficient" in citation
            ):
                citation = f"Unknown, {os.path.basename(path)}, {datetime.now().year}"
        if docname is None:
            match = re.search(r"([A-Z][a-z]+)", citation)
            if match is not None:
                author = match.group(1)
            else:
                raise ValueError(
                    f"Could not parse docname from citation {citation}. "
                    "Consider just passing key explicitly - e.g. docs.py "
                    "(path, citation, key='mykey')"
                )
            year = ""
            match = re.search(r"(\d{4})", citation)
            if match is not None:
                year = match.group(1)
            docname = f"{author}{year}"
        docname = self._get_unique_name(docname)

        doc = Doc(docname=docname, citation=citation, dockey=dockey)
        self.docs[dockey] = doc
        self.docnames.add(docname)
        self.texts.extend(texts)
        self._build_texts_index()
        return docname

By ensuring that the paper_search method correctly updates state.docs.docs and verifying the implementation of the Docs class and its methods, you can avoid the "EmptyDocsError" and ensure that papers are correctly detected and added to the state ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Sep 24 '24 02:09 dosubot[bot]

HI there, I hit the same problems as yours. Did you solved that 'Not gathering evidence due to having no papers' problem by following Dosu's suggestions? Could you please let us know where the key is. thanks.

Feb 03 '25 03:02 wight911

I attached my output info as follows. There is a pdf file in my_papers/ there. I can use openai api very correctly. when I changed the model to ollama/llama3.2, it seems not working properly.

I put some print(...) code inside 'async def paper_search(...)', but fail to see that. It seems the paper_search() is never called.

OUTPUT INFO:

(paperqa) D:\git-base\paper-qa\my_papers>pqa --settings my_local_settings ask "what is PENS dataset?" C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\pydantic_internal_config.py:345: UserWarning: Valid config keys have changed in V2:

'fields' has been removed warnings.warn(message, UserWarning) entering main --------- cli_settings: CliSettingsSource(env_nested_delimiter='.', env_prefix_len=0) args: Namespace(settings='my_local_settings', index='default', command='ask', query='what is PENS dataset?') remaining_args: [] Could not find cost for model ollama/llama3.2. Encountered exception during tool call for tool gather_evidence: EmptyDocsError('Not gathering evidence due to having no papers.') [12:32:16] Trajectory failed. ╭──────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮ │ C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\litellm\main.py:463 in acompletion │ │ │ │ 460 │ │ │ │ response = ModelResponse(**init_response) │ │ 461 │ │ │ response = init_response │ │ 462 │ │ elif asyncio.iscoroutine(init_response): │ │ ❱ 463 │ │ │ response = await init_response │ │ 464 │ │ else: │

.......(truincated for brevity) ......

 │   265 │   │   │   │   │   │   },                                                                           │           │   266 │   │   │   │   │   │   "type": "function",                                                          │           ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

KeyError: 'name'

During handling of the above exception, another exception occurred:

       ╭──────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮           │ C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\paperqa\agents\main.py:157 in                      │           │ _run_with_timeout_failure                                                                                  │           │                                                                                                            │           │   154 ) -> tuple[PQASession, AgentStatus]:                                                                 │           │   155 │   try:                                                                                             │           │   156 │   │   async with asyncio.timeout(settings.agent.timeout):                                          │

.......(truincated for brevity) ......

│ 2169 │ │ │ │ │ ), │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
APIConnectionError: litellm.APIConnectionError: 'name' Traceback (most recent call last): File "C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\litellm\main.py", line 463, in acompletion response = await init_response ^^^^^^^^^^^^^^^^^^^ File "C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\litellm\llms\custom_httpx\llm_http_handler.py", line 177, in async_completion return provider_config.transform_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\litellm\llms\ollama\completion\transformation.py", line 263, in transform_response "name": function_call["name"], ~~~~~~~~~~~~~^^^^^^^^ KeyError: 'name'

       Received Model Group=ollama/llama3.2
       Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM Max Retries: 3

[12:32:24] Generating answer for 'what is PENS dataset?'. Could not find cost for model ollama/llama3.2. [12:35:38] Status: Paper Count=0 | Relevant Papers=0 | Current Evidence=0 | Current Cost=$0.0000 Answer: I cannot provide information about the PENS dataset as it is not a widely recognized or well-documented dataset. Can I help you with something else?

(paperqa) D:\git-base\paper-qa\my_papers>

Feb 03 '25 04:02 wight911

Hello @SURUIYUAN and @wight911,

@SURUIYUAN , do you mind sharing a minimal code to reproduce the problem? It is possible that your settings.agent.index.paper_directory points to the wrong directory? Please check this tutorial where we show a little on how to work with the Settings class.

@wight911 It seems that the cli worked well with OpenAI's models, is that correct? We know that ollama/llama3.2 fails in follow the required schema sometimes and it leads to some errors. Please check this discussion

Mar 13 '25 17:03 maykcaldas

Hi all to circle back on a few things here:

Could not find cost for model ollama/llama3.2.

This is a litellm missing cost entry: https://github.com/BerriAI/litellm/issues/9644

APIConnectionError: litellm.APIConnectionError: 'name' Traceback (most recent call last): File "C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\litellm\main.py", line 463, in acompletion response = await init_response ^^^^^^^^^^^^^^^^^^^ File "C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\litellm\llms\custom_httpx\llm_http_handler.py", line 177, in async_completion return provider_config.transform_response( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\wight\anaconda3\envs\paperqa\Lib\site-packages\litellm\llms\ollama\completion\transformation.py", line 263, in transform_response "name": function_call["name"], ~~~~~~~~~~~~~^^^^^^^^ KeyError: 'name'

This is a litellm bug: https://github.com/BerriAI/litellm/issues/8594

Hopefully they get fixed, but these are not paper-qa issues, they're issues in our upstream

Mar 30 '25 01:03 jamesbraza

paper-qa paper-qa copied to clipboard

EmptyDocsError: Not gathering evidence due to having no papers.

OUTPUT INFO:

paper-qa
paper-qa copied to clipboard