paper-qa
paper-qa copied to clipboard
High accuracy RAG for answering questions from scientific documents with citations
As of v5.2, let's say we: 1. Create a `Docs` object, and use it to gather evidence, which populates `embeddings` in `Text`s 2. Change the embedding model 3. What can...
Hello, I'm looking for an optimal chunking strategy to get relevant answers for my queries. I tried these parmeters provided in "high quality config" : - chunk size : 7000...
Problem: I'm encountering a ModuleNotFoundError: No module named 'paperqa.version' error when attempting to run the PaperQA project. Upon investigation, I've determined that the paperqa/version.py file is missing from the project...
Hello, I read NumpyVectorStore can be used as embedding model in the doc. > Therefore, NumpyVectorStore is a good place to start, it's a simple in-memory store, without an index....
It would be useful to avoid locally storing huge folders of PDFs/HTML/text files, instead reading them directly from the cloud. This would remove the need for people to spend 30+...
https://github.com/Future-House/paper-qa/pull/526 added the ability to defer embeddings when we don't need them. A natural extension of this is to speed up index builds by: 1. Deferring embedding creation from index...
As of `paper-qa==5.2.0`, within `parse_pdf_to_pages` we discard an entire document if any of its pages encounter an `ImpossibleParsingError`. Most of the time we hit an `ImpossibleParsingError`, it's due to failing...
As of `paper-qa==5.2.0`, we don't include DOI in our index when adding a document: https://github.com/Future-House/paper-qa/blob/v5.2.0/paperqa/agents/search.py#L439-L447 We should consider supporting DOI for filtration and validations as well
This PR improves consistency across readers: - Tests equation parsing of Docling and PyMuPDF - Renames PyMuPDF's `image_dpi` arg to `dpi`, since (1) this parameter impacts tables too and (2)...
Corresponds to #700, #446 > [!NOTE] > Add optional `response_schema` to `Docs.aquery` and pass it to `llm_model.call_single` as `response_format` for answer generation. > > Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit...