fastexcel icon indicating copy to clipboard operation
fastexcel copied to clipboard

parallel().forEach() is not working properly.

Open Karthick-Shanmugam-15 opened this issue 4 years ago • 5 comments

i am uploading excel file with record count of 10K. when i use foreach, its working properly. but when i try to use parallel.forEach(), its not read all the values from excel.

the following code is not reading all the data from excel. its so inconsistent.

try (InputStream is =new FileInputStream(new File(fileLocation));  ReadableWorkbook wb = new ReadableWorkbook(is)) {
			Sheet sheet = wb.getFirstSheet();
``				try (Stream<Row> rows = sheet.openStream()) {
					rows.skip(2).parallel().forEach(r -> {
						BigDecimal bDec = r.getCellAsNumber(0).orElse(null);
						if(bDec!=null)
							obj.add(bDec.intValue());
					});
				}		
		}

Please correct me if i am doing anything wrongly.

Karthick-Shanmugam-15 avatar Sep 15 '20 07:09 Karthick-Shanmugam-15

I am afraid the underlying code in fastexcel-reader is not thread-safe, so you get unexpected results when using a parallel stream. I am not sure what it would take to implement thread-safety, but it looks like quite a bit of work. Also, we rely on some classes from org.apache.poi to read worksheets. These classes do not look thread-safe, so we may have to synchronize their usage and loose all benefits of parallelism in the end. However this is an interesting topic and this could make fastexcel-reader even faster, so contributions on this subject are welcome!

ochedru avatar Sep 16 '20 07:09 ochedru

Yeah reader is not thread safe but still fasterexcel-reader is great. Would be better if we can use a parallel stream.

Karthick-Shanmugam-15 avatar Sep 19 '20 16:09 Karthick-Shanmugam-15

I have added a test to ensures that streams returned by fastexcel do work properly when in parallel mode - https://github.com/dhatim/fastexcel/commit/f2bee0f28be2a206acf465e1920a5005ccdcff2f - and they do behave the same as in sequential mode.

I mean, I don't expect them to actually run the operations in parallel, as RowSpliterator does not support partitioning - https://github.com/dhatim/fastexcel/blob/93327011f2237cd26b4b94ca296f9ca2259b1fe4/fastexcel-reader/src/main/java/org/dhatim/fastexcel/reader/RowSpliterator.java#L58 - but they should work properly nevertheless.

Looking at you code, I'm concerned about the obj.add(). What is obj? Is the add() method thread safe?

rzymek avatar Jan 05 '21 19:01 rzymek

I am afraid the underlying code in fastexcel-reader is not thread-safe, so you get unexpected results when using a parallel stream. I am not sure what it would take to implement thread-safety, but it looks like quite a bit of work. Also, we rely on some classes from org.apache.poi to read worksheets. These classes do not look thread-safe, so we may have to synchronize their usage and loose all benefits of parallelism in the end. However this is an interesting topic and this could make fastexcel-reader even faster, so contributions on this subject are welcome!

@ochedru Is the fastexcel-writer thread safe?

epiard13 avatar Mar 02 '21 18:03 epiard13

I have added a test to ensures that streams returned by fastexcel do work properly when in parallel mode - f2bee0f - and they do behave the same as in sequential mode.

I mean, I don't expect them to actually run the operations in parallel, as RowSpliterator does not support partitioning -

https://github.com/dhatim/fastexcel/blob/93327011f2237cd26b4b94ca296f9ca2259b1fe4/fastexcel-reader/src/main/java/org/dhatim/fastexcel/reader/RowSpliterator.java#L58

  • but they should work properly nevertheless. Looking at you code, I'm concerned about the obj.add(). What is obj? Is the add() method thread safe?

there comes a doubt. when RowSpliterator does not support paritioning, then why parallel Stream is not working properly. the same obj.add() is working fine when I just use stream. if parallel process is not happing, then obj.add() should work even if using parallelstream. if not wrong, Parallel process is happening. thats why Obj.add() is not working as list(Obj.add()) is not thread safe.

Please correct me if am wrong.

Karthick-Shanmugam-15 avatar Mar 03 '21 05:03 Karthick-Shanmugam-15