byaldi issues

Dependency related modifications

Hello, The dependency on pdf2image and poppler-utils gave me some headaches because it is not possible to install poppler-utils in my work environment. So, I implemented custom classes that does...

erenirmak

Document ID 0 with page ID 1 already exists in the index

I used vidore/colqwen2-v1.0 for RAG on multiple pdf files with streamlit. The streamline interface inputs zip file, unzips them and RAG on the pdf files. I tried assigning 'index_name' parameter...

josephmattamana90

How to cut a scanned PDF file into pieces and extract the content.

How to cut a scanned PDF file into pieces according to the content, such as the content on a newspaper, and extract the content of each piece and put it...

qingtian1771

Alternative to using in-memory collection

7

When I load from an index with `model = RAGMultiModalModel.from_index(index_path=index_name)` then I get the following message > You are using in-memory collection. This means every image is stored in memory....

carstenj-eksponent

Add `tqdm` progress for file indexing

## Changes - Added tqdm progress bar to show indexing progress when processing multiple files - Removed redundant print statement since progress is now shown via tqdm ## Why -...

dnth

Is the latest version 0.0.5 or 0.0.7?

Pypi is the source of truth I suspect. I Just want to make sure I'm looking at the right code in your repo when as I'm testing this out to...

samgriek

Specifying cache directory for models

1

When I was going through the RAGMultiModalModel class's from_pretrained, I saw there is no mentions of specifying cache directory. However, ColPali engine supports that. Would be great if an additional...

Jainil-Gosalia

byaldi
byaldi copied to clipboard

Metadata

Dependency related modifications

Document ID 0 with page ID 1 already exists in the index

How to cut a scanned PDF file into pieces and extract the content.

Alternative to using in-memory collection

Add `tqdm` progress for file indexing

Is the latest version 0.0.5 or 0.0.7?

Specifying cache directory for models

Support Dify External Knowledge API

Add remove_from_index method

index() corruption

← Metadata

Owner

Metadata

byaldi byaldi copied to clipboard

Metadata

← Metadata

Owner

Metadata

byaldi
byaldi copied to clipboard