Make apply more memory-friendly for CUDA

Open VidithM opened this issue 10 months ago • 0 comments

If doing an in-place apply and C is iso on input but not on output, and a non-positional operator is used , then we need to realloc C->x and set all numerical entries to the iso value. However, this pins C->x on the host which is bad for CUDA. This change defers the iso expansion to the appropriate point.

(would it be better to instead change the API for GB_apply_op to have a do_iso_expansion flag? The drawback with the current solution is that the expansion may be performed when not needed, if C is not iso on input.)

Mar 12 '25 05:03 VidithM