Julian Samaroo issues

Results 168 issues of


                                            Julian Samaroo

Implement memory reclaim mechanism similar to CUDA's

To prevent failed GPU allocations when we actually have free memory

bug

hsa

Implement exponential backoff signal wait

Changes signal waiting on the host to use a clamped exponential backoff, with the option for users to define their own backoff implementations. Closes #84 Todo: - [ ] Test...

performance

Implement batched off-thread HSA signal waiting

Polling signals is generally a bad idea when one has many, many signals with long delays being waited on. While #84 will help with that, it would be more ideal...

hsa

performance

[Mark/Wait] Use HIP events to do fine-grained sync

We shouldn't need to wait on the whole stream to finish, just the portion of it that contains our launched kernels.

performance

Use LRU for rocfunction_cache

This limits the number of executables we keep cached, in the event that the user is generating a lot of them in a single session (such as for genetic/evolutionary ML)....

enhancement

speculative

Implement occupancy estimator

We should be able to guess how well a given kernel can occupy a given piece of hardware. We should then be able to allow `@roc groupsize=auto ...` to automatically...

enhancement

User-accessible objects should print nicely

As pointed out in https://github.com/JuliaGPU/AMDGPU.jl/issues/68#issuecomment-791425492, objects like `RuntimeEvent{HSAStatusSignal}` print as some monstrosity that can easily be mistaken for an error. We should make sure that all user-facing objects print decently.

enhancement

good first issue

Allow dumping binaries to file

hsa

debugging

Allow preserving HSA executable

For debugging purposes, it would be helpful to keep executables around so that they can be inspected.

hsa

debugging

Implement exponential back-off for signal wait

It should start at a bit longer than the minimum possible kernel launch-and-complete latency, and then go up to a user-defined maximum.

performance