What are the ways to improve this project?
Does it make sense to add a list and forward_list container? What are some ways to optimize this project from the hip direction?
Thanks for your interest in stdgpu!
Supporting more container types would definitely be appreciated. Regarding list and forward_list, getting their design on the GPU right, however, seems not trivial. One idea could be to take inspiration from the internal collision handling of stdgpu::unordered_map (and stdgpu::unordered_set) to implement such a container.
Regarding the HIP backend: Although its design is fairly similar to the CUDA backend, it is still considered experimental due to lack of respective hardware for testing and some rough edges in the ROCm SDK in general. ROCm 6.x also seems to be incompatiable with previous versions, so any contributions in improving the support there are welcome.