k-wave-python icon indicating copy to clipboard operation
k-wave-python copied to clipboard

[Feature] Parallelize kwavearray functions

Open faberno opened this issue 1 year ago • 12 comments

Is your feature request related to a problem? Please describe. I was wondering if its possible to parallelize get_array_binary_mask and combine_sensor_data of the kWaveArray class. For simulations with many array elements these functions are a major bottleneck.

Describe the solution you'd like Currently these functions contain a loop which iterates over every element. But the iterations are independent of each other, so in theory it should be possible to multiprocess them. This would probably require some refactoring to avoid copying the kwavearray and kgrid class for every thread.

faberno avatar Dec 01 '24 20:12 faberno

This is a great point. Should be easy to implement. Thanks for your feedback.

-Walter

waltsims avatar Dec 01 '24 22:12 waltsims

What would be the best approach? There are a few things which can be done to accelerate the code: refactoring with list comprehension to remove loops; joblib; JIT with numba; cupy.

djps avatar Dec 03 '24 14:12 djps

I just looked into get_array_binary_mask and by only refactoring it a bit I could reduce the runtime for 10,000 elements from 190 to 8 seconds.

This is the original loop:

for ind in range(self.number_elements):
    grid_weights = self.get_off_grid_points(kgrid, ind, True)
    mask = np.bitwise_or(np.squeeze(mask), grid_weights)

In self.get_off_grid_points we first calculate the integration_points (fast) and then the grid_weights (slow). So I first computed the integration points for all elements (~1 second), stacked them into one array and gave them to off_grid_points(...) alltogether, which saves a lot of uneccessary calls of off_grid_points(...) and we don't need to OR the mask anymore.

faberno avatar Dec 03 '24 22:12 faberno

I could also reduce combine_sensor_data from around ~600s to 37s (for 10,000 elements), without any major changes or parallelization. Should I open a draft PR with these changes, where we can discuss them and think of more optimizations?

faberno avatar Dec 05 '24 15:12 faberno

That would be great. Thanks @faberno!

waltsims avatar Dec 06 '24 04:12 waltsims

@faberno should we try to get these updates into v0.4.1 in the new year?

waltsims avatar Dec 24 '24 19:12 waltsims

That would be great. Will open my promised PR tomorrow

faberno avatar Dec 24 '24 20:12 faberno

Hi @faberno, I see you opened a PR to address the above and I was wondering if you had gotten any further with it? For my application, I waste a lot of time recomputing grid_weights, etc and would be happy to take this on should you not have the bandwidth?

precicely avatar Apr 12 '25 20:04 precicely

Hey @precicely, I would be very happy for your support on this issue!

faberno avatar Apr 15 '25 08:04 faberno

Hey @faberno, did you ever start on a branch for this or can I come at it with a clean slate?

precicely avatar Apr 29 '25 17:04 precicely

Also @waltsims can you assign this one to me so it doesn't fall off my radar.

precicely avatar Apr 29 '25 17:04 precicely

Hey @precicely, you can find my progress in the speedup_karray branch in my fork. You can also find it in my draft PR. But of course feel free to start off freshly if you prefer!

faberno avatar Apr 30 '25 14:04 faberno