excel-streaming-reader icon indicating copy to clipboard operation
excel-streaming-reader copied to clipboard

java.lang.IndexOutOfBoundsException with a workbook

Open sambit19 opened this issue 7 years ago • 2 comments

Hi,

I am trying to read a workbook with multiple sheets in it. The total size of the workbook is around 1.2 GB. I tested my code with a simple multi-sheet workbook and it worked fine. However with the bigger sheet I am getting this error. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.loadSheets(StreamingWorkbookReader.java:121) at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:104) at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:77) at com.monitorjbl.xlsx.StreamingReader$Builder.open(StreamingReader.java:228) at XLStreamReader.main(XLStreamReader.java:30)

The code to check: ` public class XLStreamReader {

/*
 * args[0] = Source file name,
 * args[1] = sheet position,
 * args[2] = destination file name,
 * args[3] = no of rows to read in a batch
 */
public static void main(String[] args){
    InputStream is = null;
    Workbook workbook = null;
    PrintWriter pw = null;
    StringBuilder sb = null;
    try {
        is = new FileInputStream(new File(args[0]));
        workbook = StreamingReader.builder()
                .rowCacheSize(Integer.parseInt(args[3]))    // number of rows to keep in memory (defaults to 10)
                .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
                .open(is);

        Sheet sheet = workbook.getSheetAt(Integer.parseInt(args[1]));
        String targetfile = args[2]+File.separator+sheet.getSheetName()+".csv";
        pw = new PrintWriter(new File(targetfile));
        sb = new StringBuilder();
        for (Row r : sheet) {
            StringBuilder line = new StringBuilder();
            for (Cell c : r) {
                line.append(c.getStringCellValue());
                line.append(";");
            }
            line = new StringBuilder(line.substring(0, line.length()-1));
            sb.append(line);
            sb.append(String.format("%n"));
        }
        pw.write(sb.toString());
        pw.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

}

}

`

sambit19 avatar Apr 26 '17 13:04 sambit19

Hi,

FYI It doesn't work with latest apache POI 3.16. Tested with previous 3.15 with successful iterating through all dataset.

alusbars avatar May 04 '17 10:05 alusbars

I think this is what is being reported in #91 . The cause seems to be that POI returns duplicate sheets for some workbooks but not duplicate properties. I'm not sure what actually causes this to happen, but I was able to work around it by just verifying that the sheet URIs are unique. I pushed the fix to the master branch, can you give it a try with the latest code?

monitorjbl avatar May 09 '17 03:05 monitorjbl