pyndri
pyndri copied to clipboard
IOError: Indri repository contain more than one index.
I have indexed a huge number of documents using IndriBuildIndex. I am able to run queries using IndriRunQuery on the same index, but when try to open the index in pyIndri I get the following error:
IOError: Indri repository contain more than one index.
How many documents are you trying to index? Are you using any of the distributed indexing features of Indri?
The reason why there is no support for the scenario where Indri creates multiple indexes internally is because I never ran into the case myself. If you can provide me with an example on how I can trigger the behaviour, then I might be able to provide support for this feature.
I am indexing around 25 Million documents. No I am not using distributed indexing features(I am not aware of them.) However, due to memory issues I am indexing them in batches, i.e. say the docs are split in 2 directorys A and B, first you index docs of directory in C and then index docs of B also in C. In this scenario I get 2 indexes in the index directory, this can also be seen in the manifest file.
Ah, I see. Can you provide a minimal example of configuration files/IndriBuildIndex invocations that performs this behaviour? I wasn't aware that this was possible!
I can't promise an ETA on this, but I will look at this when I find some time.