runtime
                                
                                 runtime copied to clipboard
                                
                                    runtime copied to clipboard
                            
                            
                            
                        Add asynchronous copy
Add asynchronous copy operation anydsl_copy_async.
The "async" is only a hint and only works on CUDA and OpenCL. Did not find a suitable method for HSA. CPU could have async, but usually the host is handled as a single unit without async capabilities, therefore it was not added intentionally.
Tested with Rodent (Artic).
If the copy is asynchronous, how do you know it's finished ? Device-wide barrier ?
Yes. Unfortunately, there is no access to streams or other finer-grade barriers in the API. Having a common set between all the device types we support is quite difficult. Especially because of OpenCL. :/
If you have an idea for finer-grade barriers, feel free to mention it. I am very interested in that :D
For HSA, you can use hsa_amd_memory_async_copy on AMD GPUs.
The hsa function requires signals (which might be useful for events [other PR]). What would be the best practice to provide them for each call without exposing it to the AnyDSL user? Having a platform / device specific list of current signals?