lucene-s3directory icon indicating copy to clipboard operation
lucene-s3directory copied to clipboard

ArrayIndexOutOfBoundsException during reading of indexes.

Open Mattyeng opened this issue 3 years ago • 11 comments

Hello, I'm having a problem during the Reading of some indexes in a S3 Bucket.

In particular, searching for documents in my S3 bucket sometimes generates an error like this: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -17998 out of bounds for byte[22568]. This error happens most of the time, while sometimes the same indexes are read correctly and I do not get any error running the same test several times. My assumption is that this could be generated from a strange configuration of the indexReader and maybe also the configuration of the Buffer. The indexes in the S3 Bucket are Generated with Lucene 7.7.3.

Mattyeng avatar Jan 21 '22 15:01 Mattyeng

Hm, not sure what's causing that... How large are the index files?

albogdano avatar Jan 21 '22 16:01 albogdano

biggest files are around 20KB, thanks for the reply immagine

Mattyeng avatar Jan 22 '22 15:01 Mattyeng

Can you show me the full stack trace from the logs, please?

albogdano avatar Jan 22 '22 15:01 albogdano

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: arraycopy: source index -17364 out of bounds for byte[18832] at java.base/java.lang.System.arraycopy(Native Method) at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:130) at org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:138) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.document(CompressingStoredFieldsReader.java:555) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:571) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:578) at org.apache.lucene.index.CodecReader.document(CodecReader.java:84) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:118) at org.apache.lucene.index.IndexReader.document(IndexReader.java:349) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:316) at com.erudika.lucene.store.s3.ReadIndex.main(ReadIndex.java:29)

Mattyeng avatar Jan 22 '22 16:01 Mattyeng

I don't see a class named ReadIndex.java in the source code - is that your own code? How exactly are you reading the indexes from S3?

albogdano avatar Jan 22 '22 16:01 albogdano

Yes I didn't specified it sorry, I made a custom test class to read Index from s3:

    public class ReadIndex {
    public static void main(String[] args) throws IOException, ParseException {
        Logger logger = LoggerFactory.getLogger(ReadIndex.class);
        S3Directory s3Directory = new S3Directory("s3.ambra.index.lucene");
        try(IndexReader indexReader = DirectoryReader.open(s3Directory)) {
            IndexSearcher searcher = new IndexSearcher(indexReader);

                QueryParser queryParser = new QueryParser("CONTENT", new ItalianAnalyzer());
                Query   parseredQuery = queryParser.parse("oracle");
            TopDocs result = searcher.search(parseredQuery, 10000);
            logger.info("Result {}", result.scoreDocs.length);
            for (ScoreDoc scoreDoc: result.scoreDocs) {
                final Document document = searcher.doc(scoreDoc.doc);
                final String documentId = document.get("ID");
                final String table = document.get("TABLE");
                logger.info("{}_{}, {}",table, documentId, scoreDoc.score);
            }
        }
    }
}

Mattyeng avatar Jan 24 '22 07:01 Mattyeng

I honestly have no idea what's going on. The tests pass but I also cannot read any index which is manually uploaded to S3. There's a problem in the code which reads the index from S3 but I can't pinpoint it.

albogdano avatar Jan 24 '22 21:01 albogdano

I will see what I can do but I can't promise a fix. Keep in mind that this is an experimental project which is not at all recommended for production use.

albogdano avatar Jan 26 '22 11:01 albogdano

Thanks a lot! May I help you with something?

Mattyeng avatar Jan 27 '22 07:01 Mattyeng

If you can find the root cause of the problem, pull requests are open. I tried but I get org.apache.lucene.index.CorruptIndexException or BufferUnderflowException when I the code tries to read a non-existent file _XY.fnm. Sorry - I give up.

albogdano avatar Jan 27 '22 17:01 albogdano

The issue lies in this method: https://github.com/albogdano/lucene-s3directory/blob/41325a61cb52afb2eb301b80e68fe6ff9eba2909/src/main/java/com/erudika/lucene/store/s3/index/FetchOnBufferReadS3IndexInput.java#L189 Since Lucene 8.x the signature of that method has changed and I don't know how to implement it.

albogdano avatar Jan 27 '22 17:01 albogdano