excel-streaming-reader
excel-streaming-reader copied to clipboard
java.lang.IndexOutOfBoundsException with a workbook
Hi,
I am trying to read a workbook with multiple sheets in it. The total size of the workbook is around 1.2 GB. I tested my code with a simple multi-sheet workbook and it worked fine. However with the bigger sheet I am getting this error.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.loadSheets(StreamingWorkbookReader.java:121) at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:104) at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:77) at com.monitorjbl.xlsx.StreamingReader$Builder.open(StreamingReader.java:228) at XLStreamReader.main(XLStreamReader.java:30)
The code to check: ` public class XLStreamReader {
/*
* args[0] = Source file name,
* args[1] = sheet position,
* args[2] = destination file name,
* args[3] = no of rows to read in a batch
*/
public static void main(String[] args){
InputStream is = null;
Workbook workbook = null;
PrintWriter pw = null;
StringBuilder sb = null;
try {
is = new FileInputStream(new File(args[0]));
workbook = StreamingReader.builder()
.rowCacheSize(Integer.parseInt(args[3])) // number of rows to keep in memory (defaults to 10)
.bufferSize(4096) // buffer size to use when reading InputStream to file (defaults to 1024)
.open(is);
Sheet sheet = workbook.getSheetAt(Integer.parseInt(args[1]));
String targetfile = args[2]+File.separator+sheet.getSheetName()+".csv";
pw = new PrintWriter(new File(targetfile));
sb = new StringBuilder();
for (Row r : sheet) {
StringBuilder line = new StringBuilder();
for (Cell c : r) {
line.append(c.getStringCellValue());
line.append(";");
}
line = new StringBuilder(line.substring(0, line.length()-1));
sb.append(line);
sb.append(String.format("%n"));
}
pw.write(sb.toString());
pw.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
`
Hi,
FYI It doesn't work with latest apache POI 3.16. Tested with previous 3.15 with successful iterating through all dataset.
I think this is what is being reported in #91 . The cause seems to be that POI returns duplicate sheets for some workbooks but not duplicate properties. I'm not sure what actually causes this to happen, but I was able to work around it by just verifying that the sheet URIs are unique. I pushed the fix to the master branch, can you give it a try with the latest code?