cccl
cccl copied to clipboard
Investigate where time is being spent in cuda.coop tests
It seems each test invocation takes around 0.8-1.5 seconds or so on fast modern hardware. Is that SOL? Investigate what's taking place during each test to see where the time is going. An strace of an isolated test would be a good start.