Severin Dicks
Severin Dicks
Hey @felixpetschko I looked at your kernel and they technically work, but you can make them even better. I would advocate for using prefetching and also to use one block...
Hey @felixpetschko can you send me a larger dataset to test this? I have some ideas and want to see if this works.
@grst I talked to @Zethson and he adjusted the setting in cirun. The test is now running but failing.
For a large Matrix in the Hamming-Kernel have you checked that int is enough to cover the indexing of data?
I didn't add release note yet because I'm not sure if this should already go into 1.10.4
usally you can ignore those. I know it looks bad but it should work regardless.
This will never work in any context. CUDA doesnt support int64(long long int) atomics. When working with dense array please use float32 or float64. Sparse operation already enforce this by...
@MPebworthEpana From what I can see there is that I guess your chunks are too big. At least that what I think from seeing this: `RuntimeError: 2 of 2 worker...
Can you try this function without putting the matrix onto the gpu
@lukasadam thank you so much for bringing this up. I was not aware of this happening. Do you have a reproducer for this issue? I know that rsc uses 32...