accelerated-scan
accelerated-scan copied to clipboard
Log-space version
Feng et al. proposed a log-space implementation of parallel scan for improved numerical stability. It should be fairly easy to implement, but I'm a bit out of practice with my CUDA skills and wanted to ask whether you don't already have it on your mind by any chance before I attempt an implementation by myself.