GoldenRetriever example from README does not work
When running the snippet on the README file
from relik.retriever import GoldenRetriever
encoder_name_or_path = "sapienzanlp/relik-retriever-e5-base-v2-aida-blink-encoder"
index_name_or_path = "sapienzanlp/relik-retriever-e5-base-v2-aida-blink-wikipedia-index"
retriever = GoldenRetriever(question_encoder=encoder_name_or_path, document_index=index_name_or_path, device="cuda:0")
retriever.retrieve("Michael Jordan was one of the best players in the NBA.", top_k=100)
the code breaks with error
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "[...]/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "[...]/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "[...]/python3.12/site-packages/relik/retriever/pytorch_modules/model.py", line 381, in default_collate_fn
_text = [sample[0] for sample in x]
~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable
here's the full stacktrace
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 retriever.retrieve("Michael Jordan was one of the best players in the NBA.", top_k=100)
File[...]/lib/python3.12/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File[...]/lib/python3.12/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File[...]/lib/python3.12/site-packages/relik/retriever/pytorch_modules/model.py:356, in GoldenRetriever.retrieve(self, text, text_pair, input_ids, attention_mask, token_type_ids, k, max_length, precision, collate_fn, batch_size, num_workers, progress_bar, **kwargs)
354 try:
355 with get_autocast_context(self.device, precision):
--> 356 for batch in dataloader:
357 batch = batch.to(self.device)
358 question_encodings = self.question_encoder(**batch).pooler_output
File[...]/lib/python3.12/site-packages/torch/utils/data/dataloader.py:631, in _BaseDataLoaderIter.__next__(self)
628 if self._sampler_iter is None:
629 # TODO(https://github.com/pytorch/pytorch/issues/76750)
630 self._reset() # type: ignore[call-arg]
--> 631 data = self._next_data()
632 self._num_yielded += 1
633 if self._dataset_kind == _DatasetKind.Iterable and \
634 self._IterableDataset_len_called is not None and \
635 self._num_yielded > self._IterableDataset_len_called:
File[...]/lib/python3.12/site-packages/torch/utils/data/dataloader.py:1346, in _MultiProcessingDataLoaderIter._next_data(self)
1344 else:
1345 del self._task_info[idx]
-> 1346 return self._process_data(data)
File[...]/lib/python3.12/site-packages/torch/utils/data/dataloader.py:1372, in _MultiProcessingDataLoaderIter._process_data(self, data)
1370 self._try_put_index()
1371 if isinstance(data, ExceptionWrapper):
-> 1372 data.reraise()
1373 return data
File[...]/lib/python3.12/site-packages/torch/_utils.py:705, in ExceptionWrapper.reraise(self)
701 except TypeError:
702 # If the exception takes multiple arguments, don't try to
703 # instantiate since we don't know how to
704 raise RuntimeError(msg) from None
--> 705 raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "[...]/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "[...]/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "[...]/python3.12/site-packages/relik/retriever/pytorch_modules/model.py", line 381, in default_collate_fn
_text = [sample[0] for sample in x]
~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable
running everything on Python 3.10.12 with torch 2.1.2+cu121.
Getting the same error, related to this:
cannot import name 'GoldenRetriever'
from relik.retriever.pytorch_modules.model import GoldenRetriever ImportError: cannot import name 'GoldenRetriever' from partially initialized module 'relik.retriever.pytorch_modules.model' (most likely due to a circular import) in retriever\pytorch_modules\model.py
It seems that the class is missing, while GoldenRetrieverModel is there.
Getting the same error when trying to run this on my fine-tuned model :
retriever.retrieve("Michael Jordan was one of the best players in the NBA.", top_k=100)
We just updated ReLiK to 1.0.7 which contains a fix for the issue. Let us know if it works now!
Getting the same error, related to this:
cannot import name 'GoldenRetriever'
from relik.retriever.pytorch_modules.model import GoldenRetriever ImportError: cannot import name 'GoldenRetriever' from partially initialized module 'relik.retriever.pytorch_modules.model' (most likely due to a circular import) in retriever\pytorch_modules\model.py
It seems that the class is missing, while GoldenRetrieverModel is there.
@csaiedu I can't replicate this problem in a fresh local environment. Let me know if the problem persists.
Thank you Riccorl, Upgrading leads to that error "ValueError: source code string cannot contain null bytes" on windows with fresh environment
Fixed with installation on a Linux machine and new envirnoment. Could be a corrupt conda env on windows
Thank you @Riccorl, it works just fine now
Thank you @Riccorl , it's also working for my fine-tuned model !
Thank you Riccorl, Upgrading leads to that error "ValueError: source code string cannot contain null bytes" on windows with fresh environment
@csaiedu Can you share the full error stack?