byaldi
byaldi copied to clipboard
Unable to add Metadata to index
When trying to add metadata to an index, either using a list of metadata dicts or a mapping of uid to metadata dict (shown below), it always produces a key error.
Example:
RAG = RAGMultiModalModel.from_pretrained("vidore/colpali", device='cuda:2', verbose=1)
# contains 29 1-page pdfs
files = glob(os.path.join('/dataset_pdf', '*.pdf'))
# generate simple unique ids
uids = list(range(len(files)))
# get the file names
report_ids = [file.split('/')[-1].split('.pdf')[0] for file in files]
metadata = {uids[i]: {'file_name':report_ids[i]} for i in range(len(uids))}
RAG.index(
input_path='dataset_pdf',
index_name='Documents', # index will be saved at index_root/index_name/
doc_ids=uids,
store_collection_with_index=True,
overwrite=True,
metadata=metadata,
)
This produces the following error:
report_ids = [file.split('/')[-1].split('.pdf')[0] for file in files]
metadata = {uids[i]: {'file_name':report_ids[i]} for i in range(len(uids))}
--> RAG.index(
input_path='dataset_pdf',
index_name='Documents', # index will be saved at index_root/index_name/
doc_ids=uids,
store_collection_with_index=True,
overwrite=True,
metadata=metadata,
)
File ~/miniconda3/envs/rag/lib/python3.9/site-packages/byaldi/RAGModel.py:111, in RAGMultiModalModel.index(self, input_path, index_name, doc_ids, store_collection_with_index, overwrite, metadata)
def index(
self,
input_path: Union[str, Path],
(...)
...
--> current_metadata = metadata[i] if metadata else None
if current_doc_id in self.doc_ids:
raise ValueError(f"Document ID {current_doc_id} already exists in the index")
KeyError: 0
Removing metadata solves this problem, however, it should be ok based on the metadata docstring from RAGMultiModalModel