byaldi
byaldi copied to clipboard
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Hello, The dependency on pdf2image and poppler-utils gave me some headaches because it is not possible to install poppler-utils in my work environment. So, I implemented custom classes that does...
I used vidore/colqwen2-v1.0 for RAG on multiple pdf files with streamlit. The streamline interface inputs zip file, unzips them and RAG on the pdf files. I tried assigning 'index_name' parameter...
How to cut a scanned PDF file into pieces according to the content, such as the content on a newspaper, and extract the content of each piece and put it...
When I load from an index with `model = RAGMultiModalModel.from_index(index_path=index_name)` then I get the following message > You are using in-memory collection. This means every image is stored in memory....
## Changes - Added tqdm progress bar to show indexing progress when processing multiple files - Removed redundant print statement since progress is now shown via tqdm ## Why -...
Pypi is the source of truth I suspect. I Just want to make sure I'm looking at the right code in your repo when as I'm testing this out to...
When I was going through the RAGMultiModalModel class's from_pretrained, I saw there is no mentions of specifying cache directory. However, ColPali engine supports that. Would be great if an additional...
Dify is a powerful LLM app development platform. It contains a built-in RAG system. Dify supports external knowledge base by exposing an API to other RAG systems. The API specification:...
## Description Add the possibility to remove a document from the index following #38 ## Test Tested on a DB of 10 documents with 932 pages in total and obtained...
Hi, I have been trying to run the indexing on a set of 80 pdf documents (~150 pages each) by submitting batch jobs. Since the indexing took longer than expected...