hipBLASLt icon indicating copy to clipboard operation
hipBLASLt copied to clipboard

[Issue]: supermassive RAM usage with TensileCreateLibrary

Open aviallon opened this issue 9 months ago • 3 comments

Problem Description

When trying to build hipblaslt, it has to use Tensile, more precisely, TensileCreateLibrary.py Even with 64 GB of RAM, Zswap enabled + 32GB of swap and limiting build to only ONE core, the script gets killed by the OOM killer! I sincerely think there might be a way not too load all the data at once during whatever TensileCreateLibrary. That or it is leaking memory, since its memory usage steadily increases all the way up!

Operating System

NixOS 25.05 (experimental branch)

CPU

AMD Ryzen 9 5950X

GPU

AMD Radeon RX 6700 XT + AMD Radeon RX Vega 64

ROCm Version

ROCm 6.3.3

ROCm Component

Tensile

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

aviallon avatar Mar 26 '25 19:03 aviallon

Hi @aviallon. Internal ticket has been created to investigate your issue. Thanks!

ppanchad-amd avatar Mar 27 '25 15:03 ppanchad-amd

I think this should be transferred to hipBLASLt issue as although they reuse the name Tensile it is a self contained version in hipblasLt.

TorreZuk avatar Mar 27 '25 15:03 TorreZuk

@aviallon this is a known issue that we are working on. We will follow up over the next few weeks once we have the relevant changes in.

davidd-amd avatar Apr 10 '25 23:04 davidd-amd

@aviallon we've had some changes that improve memory usage over the last month or so. All changes should be in develop.

davidd-amd avatar May 28 '25 22:05 davidd-amd

This issue has been migrated to: https://github.com/ROCm/rocm-libraries/issues/316

Closing the issue in this repo. Please refer to the migrated issue for updates.

idass1990 avatar Jun 20 '25 21:06 idass1990