itk-wasm icon indicating copy to clipboard operation
itk-wasm copied to clipboard

Provide a memory safe version of I/O methods

Open WillCMcC opened this issue 2 years ago • 6 comments

There are surprisingly low limits on how much RAM a service worker can consume (4gbs), and even more surprising limits on total RAM usage per-tab on windows chromium machines (8gb).

In our testing, it takes ~4x the file size, in terms of allocated RAM, to run readImageDicomFileSeries — so in order to call readImageDicomFileSeries on a 250mb file series, the browser uses 1gb of RAM. Also in our testing, no browser can successfully process 1gb of dicom without running out of memory.

In our testing, at rest, vtk-js uses about ~2x the file size, in terms of allocated RAM, to display the image volume. What this means is that ITK is the bottleneck — we are unable to process files with itk-wasm that we should be able to view, just fine.

Is it possible to create a memory-safe or throttled version of these utilities that guarantee they won’t run the browser or thread out of memory?

WillCMcC avatar Jun 02 '22 20:06 WillCMcC

I think providing a memory-bounded version would be difficult, especially in the context of generating a volume from DICOM slices.

As for how much RAM readImageDicomFileSeries takes, that will need investigation. With both ImageReader and GDCM internals, I don't exactly know where the data is being copied.

floryst avatar Jun 14 '22 19:06 floryst

@WillCMcC in your use case is singleSortedSeries == true? If so, we could reduce peak memory usage, at least to some degree here:

https://github.com/InsightSoftwareConsortium/itk-wasm/blob/9b02b685bb3eea5fb479dc8a39a4cdc853b6e88e/src/io/readImageDICOMArrayBufferSeries.ts#L73-L79

thewtex avatar Jun 20 '22 18:06 thewtex

If passing in an already sorted series will reduce peak memory usage, one idea is to have a utility that can take in a list of files and return them in sorted order, which can then be passed into readImageDICOMSeries. This would be adding support for breaking up the steps of readImageDICOMSeries.

floryst avatar Jun 21 '22 15:06 floryst

Some notes and questions on further investigation:

  • What memory are you measuring/how are you measuring your allocated memory? Are you measuring the main thread, the web worker, or the browser processes (or something else)?
  • For my sample dataset, I'm noticing the data type is Float64. This naturally lends itself to a larger memory footprint for larger datasets. In fact, after reading in my DICOM files into an image, my heap is roughly 4x the total byte size of my files (8x the number of pixels in my image). Deleting the image and running GC returns the allocated heap to the baseline (before reading the DICOM files).
    • What is the datatype for your DICOM files?
  • The webworker itself takes up quite some memory. ALLOW_MEMORY_GROWTH is enabled, which means the webworker will expand its memory as needed. Note that there is no memory shrinkage yet, so the allocated memory will remain at peak, even if the routines do not need that much memory.

floryst avatar Jun 22 '22 19:06 floryst

I have been measuring the browser process. Looking at the chrome task manager can give granularity on threads but Im finding browser process to be a good signal with the errors that result giving insight into what is crashing out -- if the browser stops responding (white screen) it is usually the main thread OOM, If an image fails to load with an allocation error from itk-wasm internals it's usually a worker thread running out of memory.

It looks like our files are int16, transfer syntax is Explicit VR Little Endian.

WillCMcC avatar Jun 23 '22 18:06 WillCMcC

if the browser stops responding (white screen) it is usually the main thread OOM

This is an interesting case. The webworker should just throw an error, but not crash, if it tries to allocate more memory. In emscripten, any attempts to grow the memory block beyond what the webworker can provide will trigger undefined behavior, which just results in a console error. OOM on the main thread is indicative of something more serious. I still suspect the webworker of having a hand in this, but I would need to see this happen to get further ideas.

floryst avatar Jun 23 '22 20:06 floryst