[ROCm] Hipify changes
- Add Hipify as a git submodule
- Trigger hipify from cmake build
- TP_USE_ROCM controls the trigger, which will be set to ON when building on ROCm
@lw @jeffdaily, @jithunnair-amd, Please review the changes.
@pruthvistony I think we discussed this before, but just to make sure: could the build_amd.py be part of hipify-torch so that it doesn't have to be added to the hipifying project's sources? Is there anything that's project-specific that cannot be passed in as arguments to build_amd.py?
@pruthvistony I think we discussed this before, but just to make sure: could the build_amd.py be part of hipify-torch so that it doesn't have to be added to the hipifying project's sources? Is there anything that's project-specific that cannot be passed in as arguments to build_amd.py?
As discussed, build_amd.py can be moved to hipify-torch repo and triggered directly from cmake. Just that passing all parameters like list(regex) can be tricky within cmake. Currently I dont see anything specific problem as such in passing arguments through build_amd.py. I have kept current change same as it is used in other projects. Can update this change as an improvement, since it is not a blocker for other ROCm build related changes.
I could be OK with merging this but (correct me if I'm wrong) I don't think this is yet usable is it? If anyone tried to run with
TP_USE_ROCMthe build would most likely fail. Should we fix all that and ensure everything works before we do that?Also, while I guess we cannot run any ROCm tests on CircleCI (because it's lacking the necessary hardware), I think we should be able to at least add a ROCm build job. Do you think you could do that? Thanks!
Currently pyTorch ROCm tests are executed on jenkins. The changes are not yet usable, I am raising changes in multiple PR as it was suggested previously. Next PR which builds tensorpipe enabling 'TP_USE_ROCM' is in draft mode, but is not complete due to a not support API. For which we are working with the HIP team.
Regarding the ROCm build, I believe job can be setup. Regarding the tests I will check on the necessary hardware and get back.
Currently pyTorch ROCm tests are executed on jenkins.
Yes, I saw that. However that's because they are both build and run on AMD GPUs right? If we only built, without running, we could do so on machines without any GPUs. (PyTorch already builds the CUDA version on CPU-only machines). It wouldn't be perfect but it would at least catch build issues, for example caused by hipification.
Moved all the hipify related files into the hipify-torch repo.