James

Results 20 comments of James

>This would require simply updating the example implementations for each backend, and any users who are using them would be automagically good, and in the future, users who are looking...

Hello! I had a look at this PR. I have no idea if this feedback is welcome, but I figured I might as well. This seems to both be a...

Indeed, the implementation still does exactly that (v8), it just also contains that bug which flushes any uint64 which is a nan when interpreted as double to the same nan...

Chrome updated their implementation after I reported the nan issue, it no longer uses xorshift128+ but a variant instead

After realising that my IDE scrubs environment variables and being puzzled for a while, setting that environment variable does indeed result in a correct looking RGP trace https://i.imgur.com/MoKHvBS.png Which at...

Thanks very much for the replies. Do you happen to know if this is something that affects HIP as well? I've been considering switching APIs to get better performance, but...

Some brief testing has shown that using a ring of command queues (in my case 16) to submit parallel work, and using markers to synchronise does seem to successfully recover...

In the end, I was able to create a longer term solution for me by inspecting the arguments to kernels for their read/write flags, and then automatically distributing work across...

I've been playing around with this a lot since I wrote this bug report, and its becoming an ever bigger thorn in trying to get acceptable GPU performance out AMD's...

Thanks very much for the update, its nice to see that this has been reproduced and fixed so quickly!