Tensile
Tensile copied to clipboard
[Feature]: support for gfx1103
Suggestion Description
Will there be support for gfx1103 any time soon?
Operating System
Arch Linux
GPU
Radeon 780M
ROCm Component
rocBLAS and Tensile
I think Fedora is carrying patches that enable rocBLAS and Tensile on gfx1103. However, it's not officially supported by AMD and I'm not sure how well it works.
Tensile Patch: https://src.fedoraproject.org/rpms/python-tensile/blob/6f308b0956b7736ae874b07f8ebc9f404fa2fae5/f/0001-enable-gfx1103-for-Tensile.patch rocBLAS Patch: https://src.fedoraproject.org/rpms/rocblas/blob/74df24057a4579f507a50431aaa96ae7484d1567/f/0001-add-gfx1103-support-for-rocBLAS.patch
On rocm sdk builder gfx1103 is also now initially supported. I have tested it with the framework 16 laptop where I have both the gfx1102 and gfx1103.
https://github.com/lamikr/rocm_sdk_builder
We have there patches for rocBLAS to add support for some other gpus also.
At the moment on rocm_sdk_builder we are working for tuning the logic improvement but there I am seeing problem at least when using Tensile from rocm-6.1.2 release.
If I run (or example_vega10_tuning.yaml) Tensile/Tensile/bin/Tensile example_gfx1035_tuning.yaml . > tuning1.out 2>&1
I get following error:
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoi
So far I have been able to trace that they come from ResultFileReporter.cpp and I can get rid from them by commenting these 3 stoi conversions.
else if(key == ResultKey::GfxFrequency)
{
//m_currGfxClock = static_cast<uint16_t>(std::stoi(valueStr));
m_currGfxClock = 0;
}
else if(key == ResultKey::Power)
{
//m_currPower = static_cast<uint16_t>(std::stoi(valueStr));
m_currPower = 0;
}
else if(key == ResultKey::TemperatureHot)
{
//m_currTemperatureHot = static_cast<uint16_t>(std::stoi(valueStr));
m_currTemperatureHot = 0;
}
Is there any easy way to printout the valueStr to stdout or stderr from this code that is run on the GPU?
I searched and similar looking error with stoi was reported in one comment on pull request https://github.com/ROCm/Tensile/pull/1888
Hi @NeoChen1024, as @cgmb mentioned, there isn't any plan to officially support gfx1103/Radeon 780M. Let me know if you have any other questions, otherwise I can close out this issue.
You can try ROCM 6.3.1 by using this: export HSA_OVERRIDE_GFX_VERSION=11.0.0
My experience with it is to do a complex matrix multiply:
-
Use the following functions: rocblas_initialize <- this throws lots of errors in the syslog, but seems to work anyway. rocblas_create_handle hipMalloc not rocblas_device_malloc_alloc as that does not work. rocblas_cgemm <- This just puts the complex matix multiply request on a queue. hipDeviceSynchronize <- Ensures the work is actually done.
-
Once the hipMalloc is done, you can write directly into those allocations just like it was a normal CPU malloc. No need for copying the data from CPU to GPU areas and back again as its an APU without separate vRAM.
-
The APU's iGPU can use up to about 34GB RAM if you have installed that much. You have to use hipMalloc to ensure the RAM is allocated in a place the iGPU can access. If the BIOS has a vRAM setting, try to set the BIOS vRAM as small as possible, this allows hipMalloc to use more RAM from the 34GB RAM max. hipMalloc allocates separately, not from the vRAM allocation.
-
ROCM seems to have really bad error handling, so do your own parameters checks before calling rocm to limit the likelihood of it failing.
Hi @NeoChen1024, as @cgmb mentioned, there isn't any plan to officially support gfx1103/Radeon 780M. Let me know if you have any other questions, otherwise I can close out this issue.
@sohaibnd Can we add support code for 1103 and mark it as unofficially supported(and maybe needs a special compile config to enable it)? In current situation, only some Linux distributions have 1103 supported rocm software stack packages. By adding it in, more distributions can benefit from it.
@Headcrabed Unfortunately it's not that simple, adding support in any meaningful way requires addressing practical aspects for enablement such as testing to make sure it works and debugging the issues that arise (not just in Tensile).
Closing this issue as there is no further action here.