pdf-processing topic
List
pdf-processing repositories
PDFs-TextExtract
127
Stars
64
Forks
Watchers
Multiple and Large PDF Documents Text Extraction.
document-processing-pipeline-for-regulated-industries
60
Stars
13
Forks
Watchers
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
doc-chatbot
817
Stars
136
Forks
7
Watchers
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
papermage
618
Stars
47
Forks
Watchers
library supporting NLP and CV research on scientific papers
pdf-to-text-chroma-search
21
Stars
7
Forks
Watchers
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma...