paper-qa
paper-qa copied to clipboard
High accuracy RAG for answering questions from scientific documents with citations
Perhaps using `curie` or `sentence-transformer` for embeddings (relatively cheap anyway). Distillbert might be good too. https://huggingface.co/deepset/deberta-v3-large-squad2
https://github.com/jerryjliu/gpt_index/issues/440
These were missing. Should make them available again.
EDIT: [a related OCR/NLP avenue](https://doi.org/10.48550/arXiv.2302.14045)
string.printable only check ASCII for other language, maybe decode with Unicode, but before that, should detect the encoding reference: https://chardet.readthedocs.io/en/latest/usage.html#basic-usage by the way, Is it necessary to check maybe_is_text()?
Nice project btw!
In case people want different caches or turn it off
What has to be the file format fed to docs.add