KernelAbstractions.jl Index type

Int32 can be quite a bit faster and we should make sure that we use it where we can for our index calculations.

May 10 '23 15:05 vchuravy

@luraess also mentioned that it would make sense to configure the hardware dimension index into the Kernel struct.

Jun 28 '23 15:06 vchuravy

Could you provide a function that would evaluate differently depending on the device? e.g.

IT = KernelAbstractions.IndexType()

Jul 16 '23 18:07 simonbyrne

In which case or device would int32 not be sufficient?

Sep 01 '23 20:09 brabreda

The maximum linear index with UInt32 is 4,294,967,295 so an array of about 4GB. With GPUs having upwards of 40GB or more memory in the data canter, it's not unlikely that a user want to process something larger than that.

In particular ML

Sep 01 '23 22:09 vchuravy