Julian Samaroo
                                            Julian Samaroo
                                        
                                    To prevent failed GPU allocations when we actually have free memory
Changes signal waiting on the host to use a clamped exponential backoff, with the option for users to define their own backoff implementations. Closes #84 Todo: - [ ] Test...
Polling signals is generally a bad idea when one has many, many signals with long delays being waited on. While #84 will help with that, it would be more ideal...
We shouldn't need to wait on the whole stream to finish, just the portion of it that contains our launched kernels.
This limits the number of executables we keep cached, in the event that the user is generating a lot of them in a single session (such as for genetic/evolutionary ML)....
We should be able to guess how well a given kernel can occupy a given piece of hardware. We should then be able to allow `@roc groupsize=auto ...` to automatically...
As pointed out in https://github.com/JuliaGPU/AMDGPU.jl/issues/68#issuecomment-791425492, objects like `RuntimeEvent{HSAStatusSignal}` print as some monstrosity that can easily be mistaken for an error. We should make sure that all user-facing objects print decently.
For debugging purposes, it would be helpful to keep executables around so that they can be inspected.
It should start at a bit longer than the minimum possible kernel launch-and-complete latency, and then go up to a user-defined maximum.