bertserini icon indicating copy to clipboard operation
bertserini copied to clipboard

lucene 10 (needs to be between 7 and 9) org.apache.lucene.index.IndexFormatTooNewException when using self created corpus

Open Reijarmo opened this issue 2 years ago • 0 comments

Hello at all.

I tried to use Bertserini for question answering with a self created corpus. The base example works perfect (with transformers == 3.4.0), but I am not able to find a solution for the lucene problem. I know Bertserini depends on lucene 8 while pyserini switched to lucene 9 in its latest version, so I installed https://pypi.org/project/pyserini/0.16.0/ on a separate conda environment, created a new index with it, but the problem stays the same.

When I tried to build an index with the pyserini version I got from installing bertserini I am stopped by “/home/user/anaconda3/envs/bertserini/bin/python: No module named pyserini.index.lucene“, Only solution i found for that upgrading pyserini which isn‘t an option because of the base bertserini problem.

Is there any easy way around? And sorry if this is a stupid question, but as a psychologist I have a rather weak informatic background knowledge.

edit1: forgot to mention which command I used to create the index python -m pyserini.index.lucene
--collection JsonCollection
--input tests/resources/sample_collection_jsonl
--index indexes/sample_collection_jsonl
--generator DefaultLuceneDocumentGenerator
--threads 1
--storePositions --storeDocvectors --storeRaw

Reijarmo avatar Nov 06 '22 12:11 Reijarmo