[Issue]: Iterating Over Executables via the Vendor Loader API Mistakenly Includes Executables Destroyed Previously As `nullptr`
Problem Description
The destruction of an HSA Executable using hsa_executable_destroy will result in the loader deleting the Executable object previously allocated. However, instead of erasing the pointer in the std::vector of Executables currently loaded, it equates it to a nullptr:
https://github.com/ROCm/ROCR-Runtime/blob/3f6ffc5b1167a43dc5e169db85655182a4c5947c/src/loader/executable.cpp#L242-L257
After the destruction of any number of Executables, if one invokes the hsa_ven_amd_loader_iterate_executables function to iterate over all executables in HSA, the iteration also returns the nullptr that was placed instead of the destroyed executables:
https://github.com/ROCm/ROCR-Runtime/blob/3f6ffc5b1167a43dc5e169db85655182a4c5947c/src/loader/executable.cpp#L259-L276
This does not seem to be expected behavior. cc: @kzhuravl
Operating System
22.04
CPU
N/A
GPU
AMD Instinct MI100
Other
No response
ROCm Version
ROCm 6.0.0
ROCm Component
ROCR-Runtime
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
@matinraayai Internal ticket has been created to investigate this issue. Thanks!
Hello @matinraayai,
Thanks for the clear issue report! However, the way this code is written is intentional.
If you notice this line:
executables[((ExecutableImpl*)executable)->id()] = nullptr; in the DestroyExecutable function, you will see that the executables array is indexed into with the id() of the deleted object. So, if you removed this entry from the array entirely, you would need to reindex the ids of all subsequent entries before the next time this function is used. The entry is instead set to nullptr to avoid this reindexing step.
Does encountering a nullptr when iterating over the executables cause a problem?
Hello @matinraayai,
Thanks for the clear issue report! However, the way this code is written is intentional.
If you notice this line:
executables[((ExecutableImpl*)executable)->id()] = nullptr;in the DestroyExecutable function, you will see that the executables array is indexed into with the id() of the deleted object. So, if you removed this entry from the array entirely, you would need to reindex the ids of all subsequent entries before the next time this function is used. The entry is instead set to nullptr to avoid this reindexing step.
Is indexing over a vector the best way to go? It seems a set or an unordered map might do a better job here; The only usages I've seen for executables are either iterations over all entries, or removal of an entry (which the map data structures make easier). But this is an implementation detail.
Does encountering a nullptr when iterating over the executables cause a problem?
I believe so; The vendor loader API doesn't say anywhere that it might return hsa_executable_t{0} and the user has to check to make sure the handles are valid.
It seems all other loader functions that iterate over executables, they check if the iteration element is a nullptr before proceeding; I think the same check should be added to the executable iteration function of the loader API.
The vendor loader API doesn't say anywhere that it might return hsa_executable_t{0} and the user has to check to make sure the handles are valid. It seems all other loader functions that iterate over executables, they check if the iteration element is a nullptr before proceeding; I think the same check should be added to the executable iteration function of the loader API.
This is a good point, I am checking in with our internal team to see if this nullptr check can be patched in, or some documentation added.
Hi @matinraayai, a nullptr check has been added for consistency with other loader functions that iterate over executables - you'll see this code change in a future release of ROCm.
Thanks for bringing this up!