pdf-processing topic

List pdf-processing repositories

PDFs-TextExtract

127
Stars
64
Forks
Watchers

Multiple and Large PDF Documents Text Extraction.

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

doc-chatbot

788
Stars
132
Forks
7
Watchers

Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.

papermage

618
Stars
47
Forks
Watchers

library supporting NLP and CV research on scientific papers

pdf-to-text-chroma-search

17
Stars
6
Forks
Watchers

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma...