cuda-kat
cuda-kat copied to clipboard
append_to_global_memory() for disparate per-thread data?
I wonder if we should consider a version of append_to_global_memory()
where each thread may have its data elsewhere (at an address); and perhaps also a version where each thread has some data that's guaranteed to be in registers (e.g. with capped common size so that we can use a kat::array
perhaps)