YAKL icon indicating copy to clipboard operation
YAKL copied to clipboard

Add compile time option to make reshape dimension 64-bit

Open Goobley opened this issue 10 months ago • 3 comments

I am refactoring some code to be 2/3D agnostic, which involves making large arrays of order [Na, Nx, Ny, Nz] into [Na, Nx*Ny*Nz] (Na is typically ~10). These could easily end up overflowing the int storage used in Dims/Bounds for reshape. This PR adds YAKL_INT64_RESHAPE as a command line/preprocesser parameter to use int64 for Dims/Bounds when set. Default behaviour is maintained when not set.

Goobley avatar Apr 12 '24 12:04 Goobley

Hey, @Goobley , my apologies for the delay here. For some reason, github wasn't notifying me of PRs. I'm looking at this now.

mrnorman avatar May 21 '24 19:05 mrnorman

Admittedly, I wasn't anticipating per-node workloads of such a large size, so it might be worth a more comprehensive review of integer casting and potential loss of size_t capacities in places.

mrnorman avatar May 21 '24 19:05 mrnorman

I completely missed the notification on this too! I intend to do some reviewing of this soon, if you're okay keeping it open.

Goobley avatar Jun 28 '24 17:06 Goobley

@Goobley , after some thought, I can make the Array classes use bigger types for int, but I don't think I can do that for the parallel_for dispatch because CUDA and HIP backends only support unsigned int types in the backend looping indices. That feels kind of mismatched since you'd likely want to parallel dispatch over those larger-ranged indices. Also, if I support this in Array classes, I feel like I need to find a way to support it in the intrinsic functions that operate on them, which would also require dispatch to need this as well.

Do you happen to know if Kokkos supports this in View types? I haven't looked into this yet.

mrnorman avatar Oct 28 '24 14:10 mrnorman

Hi @mrnorman. Those are valid comments, and it's not worth changing anything right now given the massive backend change to kokkos. I think in general uint32 should be fine, especially if one allows for bigger loop bounds where the blockIdx can go up to the max uint32 and the threadIdx is say 128 or 256. Either way, I probably have to do some migrations following the backend change to kokkos as there's little point in my not using kokkos directly, other than originally choosing YAKL as something small and easy to modify!

Goobley avatar Oct 28 '24 15:10 Goobley

Sounds good. Please don't hesitate to re-open or open a new PR/Issue if something comes up, though.

mrnorman avatar Oct 28 '24 16:10 mrnorman