porespy
porespy copied to clipboard
Lower memory footprint for `porosimetry` and `local_thickness`
Hi,
Thank you for all the hard work on this package. This is a treasure trove!
I have to use local_thickness
of huge tomography dataset (1500 x 1500 x 1200 voxels) and I had a hard time to do so even on a workstation with 64 Gb of RAM. Therefore, I looked into the code, and found a way to reduced the memory footprint. I implemented it on my side, it seems to work, so I though I could share it here.
At the moment, porosimetry
returns an array of pore sizes, one float per voxel of the original image. In my case, this is 21Gb in memory just for the output. However, the values stored in this output array are discreet. By construction, only some values of sizes are tested. The default if 25 sizes, spaced logarithmically.
What I propose, is to return two arrays:
- sizes: A 1D array of floats containing all the tested sizes, typically 25 values
- indices: An array of unsigned ints that contains the index of the pore size for each pixel.
If we allow at most 255 possible discreet sizes, the type of the second array can be uint8, taking less than 3Gb in my case. If we allow all 65535 values that fit in uint16 (crazy use case), the second array is still less than 6 Gb in my case. This would leave much most space to perform the rest of the algorithm (Euclidian distance transform, fft dilation, etc.).
If one wants the original array of floats, it is as simple as sizes[indices]. However, if one wants the pore size distribution, it is faster to compute the histogram of the indices and to plot it function of sizes.
In practice, here are the changes in the code of porespy.filters._funcs.porosimetry
Change
imresults = np.zeros(np.shape(im))
for r in tqdm(sizes, **settings.tqdm):
imtemp = dt >= r
...
imresults[(imresults == 0) * imtemp] = r
to
imresults = np.zeros(np.shape(im), np.uint8)
for ir in tqdm(range(len(sizes)), **settings.tqdm):
r = sizes[ir]
imtemp = dt >= r
...
imresults[(imresults == 0) * imtemp] = ir+1
(To be done 3 times for the 3 possible modes)
And at the end of the function return np.concatenate(([0], sizes)), imresults
instead of return imresults
.
(You may notice that the indices are shifted by +1 in order to accommodate 0 as the index of the foreground).
I thought I had replied to this when you posted it, but see that I did not, sorry for ignoring this. Thanks for this suggestion. I have not acted on it yet because I am in the process of re-writting some of the local thickness related functions.