KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

Index type

Open vchuravy opened this issue 2 years ago • 6 comments

Int32 can be quite a bit faster and we should make sure that we use it where we can for our index calculations.

vchuravy avatar May 10 '23 15:05 vchuravy

@luraess also mentioned that it would make sense to configure the hardware dimension index into the Kernel struct.

vchuravy avatar Jun 28 '23 15:06 vchuravy

Could you provide a function that would evaluate differently depending on the device? e.g.

IT = KernelAbstractions.IndexType()

simonbyrne avatar Jul 16 '23 18:07 simonbyrne

In which case or device would int32 not be sufficient?

brabreda avatar Sep 01 '23 20:09 brabreda

The maximum linear index with UInt32 is 4,294,967,295 so an array of about 4GB. With GPUs having upwards of 40GB or more memory in the data canter, it's not unlikely that a user want to process something larger than that.

In particular ML

vchuravy avatar Sep 01 '23 22:09 vchuravy