Implement copy!()
Given two arrays, eg. arr and arr_d, ensure copy!(arr_d, arr) does the right thing. Currently it complains on scalar indexing.
This should also work for both device to host, and device to device copies too.
This is implemented in CUDA.jl.
This should already work: https://github.com/JuliaGPU/AMDGPU.jl/blob/bd8b4c400bd440c571beb33f9921be28323ce573/src/array.jl#L178-L214
Can you provide an MWE of where this doesn't work?
Those snippets are copyto!(), not copy!(), right?
What's the difference between copy! and copyto!? The docstring makes it sound very similar to copyto!.
Scalar indexing no longer happens.