Tensile icon indicating copy to clipboard operation
Tensile copied to clipboard

[Feature]: support for gfx1103

Open NeoChen1024 opened this issue 1 year ago • 4 comments
trafficstars

Suggestion Description

Will there be support for gfx1103 any time soon?

Operating System

Arch Linux

GPU

Radeon 780M

ROCm Component

rocBLAS and Tensile

NeoChen1024 avatar May 23 '24 07:05 NeoChen1024

I think Fedora is carrying patches that enable rocBLAS and Tensile on gfx1103. However, it's not officially supported by AMD and I'm not sure how well it works.

cgmb avatar Jul 10 '24 17:07 cgmb

Tensile Patch: https://src.fedoraproject.org/rpms/python-tensile/blob/6f308b0956b7736ae874b07f8ebc9f404fa2fae5/f/0001-enable-gfx1103-for-Tensile.patch rocBLAS Patch: https://src.fedoraproject.org/rpms/rocblas/blob/74df24057a4579f507a50431aaa96ae7484d1567/f/0001-add-gfx1103-support-for-rocBLAS.patch

cgmb avatar Jul 10 '24 17:07 cgmb

On rocm sdk builder gfx1103 is also now initially supported. I have tested it with the framework 16 laptop where I have both the gfx1102 and gfx1103.

https://github.com/lamikr/rocm_sdk_builder

We have there patches for rocBLAS to add support for some other gpus also.

lamikr avatar Jul 18 '24 00:07 lamikr

At the moment on rocm_sdk_builder we are working for tuning the logic improvement but there I am seeing problem at least when using Tensile from rocm-6.1.2 release.

If I run (or example_vega10_tuning.yaml) Tensile/Tensile/bin/Tensile example_gfx1035_tuning.yaml . > tuning1.out 2>&1

I get following error:

terminate called after throwing an instance of 'std::invalid_argument'
what(): stoi

So far I have been able to trace that they come from ResultFileReporter.cpp and I can get rid from them by commenting these 3 stoi conversions.

            else if(key == ResultKey::GfxFrequency)
            {
                //m_currGfxClock = static_cast<uint16_t>(std::stoi(valueStr));
                m_currGfxClock = 0;
            }
            else if(key == ResultKey::Power)
            {
                //m_currPower = static_cast<uint16_t>(std::stoi(valueStr));
                m_currPower = 0;
            }
            else if(key == ResultKey::TemperatureHot)
            {
                //m_currTemperatureHot = static_cast<uint16_t>(std::stoi(valueStr));
                m_currTemperatureHot = 0;
            }

Is there any easy way to printout the valueStr to stdout or stderr from this code that is run on the GPU?

I searched and similar looking error with stoi was reported in one comment on pull request https://github.com/ROCm/Tensile/pull/1888

lamikr avatar Jul 18 '24 01:07 lamikr

Hi @NeoChen1024, as @cgmb mentioned, there isn't any plan to officially support gfx1103/Radeon 780M. Let me know if you have any other questions, otherwise I can close out this issue.

sohaibnd avatar Dec 20 '24 20:12 sohaibnd

You can try ROCM 6.3.1 by using this: export HSA_OVERRIDE_GFX_VERSION=11.0.0

My experience with it is to do a complex matrix multiply:

  1. Use the following functions: rocblas_initialize <- this throws lots of errors in the syslog, but seems to work anyway. rocblas_create_handle hipMalloc not rocblas_device_malloc_alloc as that does not work. rocblas_cgemm <- This just puts the complex matix multiply request on a queue. hipDeviceSynchronize <- Ensures the work is actually done.

  2. Once the hipMalloc is done, you can write directly into those allocations just like it was a normal CPU malloc. No need for copying the data from CPU to GPU areas and back again as its an APU without separate vRAM.

  3. The APU's iGPU can use up to about 34GB RAM if you have installed that much. You have to use hipMalloc to ensure the RAM is allocated in a place the iGPU can access. If the BIOS has a vRAM setting, try to set the BIOS vRAM as small as possible, this allows hipMalloc to use more RAM from the 34GB RAM max. hipMalloc allocates separately, not from the vRAM allocation.

  4. ROCM seems to have really bad error handling, so do your own parameters checks before calling rocm to limit the likelihood of it failing.

jcdutton avatar Jan 02 '25 00:01 jcdutton

Hi @NeoChen1024, as @cgmb mentioned, there isn't any plan to officially support gfx1103/Radeon 780M. Let me know if you have any other questions, otherwise I can close out this issue.

@sohaibnd Can we add support code for 1103 and mark it as unofficially supported(and maybe needs a special compile config to enable it)? In current situation, only some Linux distributions have 1103 supported rocm software stack packages. By adding it in, more distributions can benefit from it.

Headcrabed avatar Jan 10 '25 02:01 Headcrabed

@Headcrabed Unfortunately it's not that simple, adding support in any meaningful way requires addressing practical aspects for enablement such as testing to make sure it works and debugging the issues that arise (not just in Tensile).

sohaibnd avatar Jan 15 '25 22:01 sohaibnd

Closing this issue as there is no further action here.

sohaibnd avatar Jan 17 '25 15:01 sohaibnd