gras
gras copied to clipboard
convenience settings in thread pool to emulate TPB and STS
- TBP - thread per block - one thread group of size one thread per block
- STS - single threaded sched - one thread group of size one thread for all
There is a env var: set GRAS_TPP=1 to get thread per block -- probably want an API call for this