imagej-ops icon indicating copy to clipboard operation
imagej-ops copied to clipboard

Fractal Dimension creates thousands of zombie threads that crash ImageJ

Open mdoube opened this issue 3 years ago • 4 comments

BoneJ's Fractal Dimension plugin uses an Op to do the box-counting maths. On larger images with many boxes to count, it adopts a multithreading approach, using a standard call to get the number of available processors:

https://github.com/imagej/imagej-ops/blob/9dad3f91ebd45cbeb0a46757d0918d43d379204f/src/main/java/net/imagej/ops/topology/BoxCount.java#L196

Fractal Dimension crashes ImageJ after a few iterations of a batch job.

Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
<repeated many times>
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f318501e000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/mdoube/Fiji.app.dev/hs_err_pid379776.log
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f7f3bbcb000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
[thread 139913374230272 also had an error]

Only sometimes a stack trace is printed to the console, like this one:

[ERROR] Module threw error
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:717)
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
	at net.imagej.ops.topology.BoxCount.countForegroundBoxes(BoxCount.java:229)
	at net.imagej.ops.topology.BoxCount.lambda$countTranslatedGrids$0(BoxCount.java:192)
	at java.util.stream.ReferencePipeline$5$1.accept(ReferencePipeline.java:227)
	at java.util.stream.SpinedBuffer$1Splitr.forEachRemaining(SpinedBuffer.java:364)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.LongPipeline.reduce(LongPipeline.java:443)
	at java.util.stream.LongPipeline.min(LongPipeline.java:401)
	at net.imagej.ops.topology.BoxCount.calculate(BoxCount.java:139)
	at net.imagej.ops.topology.BoxCount.calculate(BoxCount.java:74)
	at org.bonej.wrapperPlugins.FractalDimensionWrapper.lambda$run$0(FractalDimensionWrapper.java:183)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at org.bonej.wrapperPlugins.FractalDimensionWrapper.run(FractalDimensionWrapper.java:174)
	at org.scijava.command.CommandModule.run(CommandModule.java:196)
	at org.scijava.module.ModuleRunner.run(ModuleRunner.java:165)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:124)
	at org.scijava.module.ModuleRunner.call(ModuleRunner.java:63)
	at org.scijava.thread.DefaultThreadService.lambda$wrap$2(DefaultThreadService.java:225)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Profiling active threads looks like this:

Screenshot from 2021-06-08 10-39-16

3000 - 5000 threads are created on each iteration but are not completed or removed or terminated.

Setting processors = 1 to make it a single-threaded algorithm fixes the bug, but also means that large images are slow to analyse.

A safer way to multithread is needed for box counting.

See also: https://forum.image.sc/t/memory-issues-with-bonej/53589

mdoube avatar Jun 07 '21 08:06 mdoube

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/memory-issues-with-bonej/53589/4

imagesc-bot avatar Jun 08 '21 01:06 imagesc-bot

Anisotropy uses a similar ExecutorService approach to multithreading and includes a shutdownAndAwaitTermination() method that appears to clean up old threads.

https://github.com/bonej-org/BoneJ2/blob/da5aa63cdc15516605e8dcb77458eb34b0f00b85/Modern/wrapperPlugins/src/main/java/org/bonej/wrapperPlugins/AnisotropyWrapper.java#L380

Fractal Dimension may be using a messy approach to thread creation and is not tidying up after itself.

mdoube avatar Jun 08 '21 03:06 mdoube

Fixed by calling shutdown() on the ExecutorService in net.imagej.ops.morphology.outline.Outline and net.imagej.ops.topology.BoxCount

Screenshot from 2021-08-13 17-38-22

mdoube avatar Aug 13 '21 09:08 mdoube

PR #624 should be applied as well becuase it has a better threading model for Outline

mdoube avatar Aug 17 '21 04:08 mdoube