lucene-s3directory
lucene-s3directory copied to clipboard
ArrayIndexOutOfBoundsException during reading of indexes.
Hello, I'm having a problem during the Reading of some indexes in a S3 Bucket.
In particular, searching for documents in my S3 bucket sometimes generates an error like this:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -17998 out of bounds for byte[22568].
This error happens most of the time, while sometimes the same indexes are read correctly and I do not get any error running the same test several times.
My assumption is that this could be generated from a strange configuration of the indexReader and maybe also the configuration of the Buffer.
The indexes in the S3 Bucket are Generated with Lucene 7.7.3.
Hm, not sure what's causing that... How large are the index files?
biggest files are around 20KB, thanks for the reply

Can you show me the full stack trace from the logs, please?
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -17364 out of bounds for byte[18832] at java.base/java.lang.System.arraycopy(Native Method) at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:130) at org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:138) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.document(CompressingStoredFieldsReader.java:555) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:571) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:578) at org.apache.lucene.index.CodecReader.document(CodecReader.java:84) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:118) at org.apache.lucene.index.IndexReader.document(IndexReader.java:349) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:316) at com.erudika.lucene.store.s3.ReadIndex.main(ReadIndex.java:29)
I don't see a class named ReadIndex.java in the source code - is that your own code? How exactly are you reading the indexes from S3?
Yes I didn't specified it sorry, I made a custom test class to read Index from s3:
public class ReadIndex {
public static void main(String[] args) throws IOException, ParseException {
Logger logger = LoggerFactory.getLogger(ReadIndex.class);
S3Directory s3Directory = new S3Directory("s3.ambra.index.lucene");
try(IndexReader indexReader = DirectoryReader.open(s3Directory)) {
IndexSearcher searcher = new IndexSearcher(indexReader);
QueryParser queryParser = new QueryParser("CONTENT", new ItalianAnalyzer());
Query parseredQuery = queryParser.parse("oracle");
TopDocs result = searcher.search(parseredQuery, 10000);
logger.info("Result {}", result.scoreDocs.length);
for (ScoreDoc scoreDoc: result.scoreDocs) {
final Document document = searcher.doc(scoreDoc.doc);
final String documentId = document.get("ID");
final String table = document.get("TABLE");
logger.info("{}_{}, {}",table, documentId, scoreDoc.score);
}
}
}
}
I honestly have no idea what's going on. The tests pass but I also cannot read any index which is manually uploaded to S3. There's a problem in the code which reads the index from S3 but I can't pinpoint it.
I will see what I can do but I can't promise a fix. Keep in mind that this is an experimental project which is not at all recommended for production use.
Thanks a lot! May I help you with something?
If you can find the root cause of the problem, pull requests are open. I tried but I get org.apache.lucene.index.CorruptIndexException or BufferUnderflowException when I the code tries to read a non-existent file _XY.fnm.
Sorry - I give up.
The issue lies in this method: https://github.com/albogdano/lucene-s3directory/blob/41325a61cb52afb2eb301b80e68fe6ff9eba2909/src/main/java/com/erudika/lucene/store/s3/index/FetchOnBufferReadS3IndexInput.java#L189 Since Lucene 8.x the signature of that method has changed and I don't know how to implement it.