amdsmi icon indicating copy to clipboard operation
amdsmi copied to clipboard

[Issue]: `amd-smi` C++ Interface does not report correct information in MI300A

Open garrettbyrd opened this issue 8 months ago • 1 comments

Problem Description

Here is a quick cpp file that iterates over sockets and should list the names and devices on each socket.

#include <iostream>
#include <unistd.h>
#include <amd_smi/amdsmi.h>

int main() {
    amdsmi_status_t status = amdsmi_init(AMDSMI_INIT_AMD_APUS);
    if (status != AMDSMI_STATUS_SUCCESS) {
       std::cerr << "Failed to initialize AMD SMI library" << std::endl;
       return -1;
    }

    uint32_t socket_count = 0;
    status = amdsmi_get_socket_handles(&socket_count, nullptr);
    std::cout << "Socket Total: " << socket_count << std::endl;
    std::vector<amdsmi_socket_handle> sockets(socket_count);

    for (uint32_t i = 0; i < socket_count; i++) {
        uint32_t device_count = 0;
        status = amdsmi_get_processor_handles(sockets[i], &device_count, nullptr);
        std::vector<amdsmi_processor_handle> processor_handles(device_count);
        status = amdsmi_get_processor_handles(sockets[i], &device_count, &processor_handles[0]);

        for (uint32_t j = 0; j < device_count; j++){
           // Get device type
           processor_type_t processor_type;
           status = amdsmi_get_processor_type(processor_handles[j], &processor_type);
           std::cout << "processor type: " << processor_type << std::endl;

           // Get device name
           amdsmi_board_info_t board_info;
           status = amdsmi_get_gpu_board_info(processor_handles[j], &board_info);
           std::cout << "\tdevice: " << j << "\n\t\tname:" << board_info.product_name << std::endl;
        }

        std::cout << "socket: " << i << std::endl;
        std::cout << "device count: " << device_count << std::endl;
     }

    status = amdsmi_shut_down();

    return 0;
}

However, amd-smi reports zero devices per socket:

Socket Total: 8
socket: 0
device count: 0
socket: 1
device count: 0
socket: 2
device count: 0
socket: 3
device count: 0
socket: 4
device count: 0
socket: 5
device count: 0
socket: 6
device count: 0
socket: 7
device count: 0

Version info:

$ amd-smi version -g -c
AMDSMI Tool: 25.3.0+ede62f2 | AMDSMI Library version: 25.3.0 | ROCm version: 6.4.0 | amdgpu version: 6.10.5 | amd_hsmp version: 2.2

Operating System

Rocky Linux 9.5 (Blue Onyx) x86_64

CPU

4 x AMD Instinct MI300A Accelerator (192) @ 3.70 GHz

GPU

4 x AMD Instinct MI300A Accelerator (192) @ 3.70 GHz

ROCm Version

ROCm 6.4.0

ROCm Component

amdsmi

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

garrettbyrd avatar Apr 24 '25 18:04 garrettbyrd

Hi @garrettbyrd. Internal ticket has been created to investigate this issue. Thanks!

ppanchad-amd avatar Apr 28 '25 18:04 ppanchad-amd

Hi @garrettbyrd, the first call to amdsmi_get_socket_handles will only return the socket count. A subsequent call to the function will return the socket handles.

    uint32_t socket_count = 0;
    // First call to amdsmi_get_socket_handles will return socket count
    status = amdsmi_get_socket_handles(&socket_count, nullptr);
    std::cout << "Socket Total: " << socket_count << std::endl;
    std::vector<amdsmi_socket_handle> sockets(socket_count);
    // Second call to amdsmi_get_socket_handles will return socket handles
    status = amdsmi_get_socket_handles(&socket_count, &sockets[0]);

You can find the Hello AMD SMI example over at https://rocm.docs.amd.com/projects/amdsmi/en/latest/how-to/amdsmi-cpp-lib.html#hello-amd-smi, which also showcases this functionality. With this change, your code outputs the following on a MI300A system.

Socket Total: 8
processor type: 1
        device: 0
                name:Aqua Vanjaram [Instinct MI300A]
socket: 0
device count: 1
processor type: 1
        device: 0
                name:Aqua Vanjaram [Instinct MI300A]
socket: 1
device count: 1
processor type: 1
        device: 0
                name:Aqua Vanjaram [Instinct MI300A]
socket: 2
device count: 1
processor type: 1
        device: 0
                name:Aqua Vanjaram [Instinct MI300A]
socket: 3
device count: 1
socket: 4
device count: 0
socket: 5
device count: 0
socket: 6
device count: 0
socket: 7

harkgill-amd avatar Jun 25 '25 18:06 harkgill-amd

Closing this issue out - feel free to leave a comment if you have any questions.

harkgill-amd avatar Jul 08 '25 14:07 harkgill-amd