mdanalysis
mdanalysis copied to clipboard
Optimization Suggestion: Replace np.ediff1d with array slicing for faster difference-based splitting
https://github.com/MDAnalysis/mdanalysis/blob/e64755cb1999c23bc7e0a2283644ad5a50271809/package/MDAnalysis/lib/util.py#L1862
Hi, I’d like to suggest a performance improvement in the following line:
return np.split(arr, np.where(np.ediff1d(arr) - 1 > 0)[0] + 1)
This can be rewritten more efficiently as:
diff = arr[1:] - arr[:-1]
return np.split(arr, np.where(diff > 1)[0] + 1)
Although np.ediff1d is designed to compute discrete differences between adjacent elements, it introduces unnecessary overhead by internally creating a new array and performing extra type and shape checks. In contrast, using NumPy slicing with arr[1:] - arr[:-1] achieves the exact same result with lower overhead. This avoids function call dispatch and temporary memory allocation, resulting in improved performance—especially when working with large arrays.
Since this difference array is only used for locating split indices, there’s no benefit from using np.ediff1d over simple slicing. The replacement not only boosts efficiency but also improves code clarity and aligns better with NumPy’s idiomatic practices.
@SaFE-APIOpt if you're a real person: thank you for the suggestion. Did you check that your optimization affects any time-critical code? Are you able to run some very preliminary performance tests (eg with %timeit in ipython) that indicate that this will make a difference?
If we don't get a reply here in a week we will likely just close to reduce noise on the issue tracker.
No replies, closing for now.