Christopher Kulla
Christopher Kulla
+1 to needing the grazing angle control. I would even argue that having a third parameter to control the transition between the two colors is also important in many cases....
Its probably ok, but I think it would be better to explicitly cast to `unsigned long long` here since the intent is to inspect the 64bit value in the double....
Would you be able to try against the latest version of the library? It looks like the way we codegen fmod has changed a bit.
This is a good clue. Given that the same code works in parallel for the CPU case, maybe the problem is somewhere in the PTX backend itself? Have you tried...
As a totally wild guess -- the `prototype_724` strings suggest there is some sequential numbering of symbols for the PTX code. Perhaps the NVPTX backend is using an unprotected global...
I did of bit of grepping through the LLVM source. And sure enough, the `prototype_%d` is assembled using a static counter. So I'm afraid the bug is deep inside LLVM....
No not yet -- from my limited read through the code, its not totally obvious if simply making it an atomic will fix it, or if there will be other...
Even if the patch is incomplete - its at least worth submitting right? It definitely was an obviously non-thread safe spot in the code. Maybe the review process would help...
In the patch above, I am curious why you are incrementing uniqueCallSiteCount in two places? The original code only had one increment. I can't imagine it makes much difference either...
Nevermind - I think I see why you need two.