pyndri icon indicating copy to clipboard operation
pyndri copied to clipboard

IOError: Indri repository contain more than one index.

Open harit7 opened this issue 7 years ago • 3 comments

I have indexed a huge number of documents using IndriBuildIndex. I am able to run queries using IndriRunQuery on the same index, but when try to open the index in pyIndri I get the following error:

IOError: Indri repository contain more than one index.

harit7 avatar Aug 29 '17 14:08 harit7

How many documents are you trying to index? Are you using any of the distributed indexing features of Indri?

The reason why there is no support for the scenario where Indri creates multiple indexes internally is because I never ran into the case myself. If you can provide me with an example on how I can trigger the behaviour, then I might be able to provide support for this feature.

cvangysel avatar Sep 12 '17 04:09 cvangysel

I am indexing around 25 Million documents. No I am not using distributed indexing features(I am not aware of them.) However, due to memory issues I am indexing them in batches, i.e. say the docs are split in 2 directorys A and B, first you index docs of directory in C and then index docs of B also in C. In this scenario I get 2 indexes in the index directory, this can also be seen in the manifest file.

harit7 avatar Sep 12 '17 04:09 harit7

Ah, I see. Can you provide a minimal example of configuration files/IndriBuildIndex invocations that performs this behaviour? I wasn't aware that this was possible!

I can't promise an ETA on this, but I will look at this when I find some time.

cvangysel avatar Sep 12 '17 04:09 cvangysel