Anton Smirnov

https://pxl-th.github.io/ [email protected]

Ukraine, Kyiv

Results 213 comments of


                                            Anton Smirnov

rocWMMA support

And at this moment matrix multiplication is not a bottleneck in DL applications for AMDGPU. Timely memory freeing is.

[Feature]: Allow specifying maximum size for memory pool

@jaydeeppatel1111 am I missing something or it looks like ROCm docs [contain](https://rocm.docs.amd.com/projects/HIP/en/docs-6.1.0/doxygen/html/structhip_mem_pool_props.html#a214586a7598eb73e4ff5ebb8aed5294d) information about `maxSize` field, but the actual release does not include https://github.com/ROCm/clr/commit/b72d8da1bdd6547c86baa119f1bacab4d418a5ea ? I'm not able to find...

`hipFreeAsync` hangs

I also ran tests using debug Julia & HIP build and besides hitting [this](https://github.com/ROCm-Developer-Tools/clr/issues/36) assert (which I commented out) there were no other issues.

`hipFreeAsync` hangs

Unfortunately, I was unable to create a MWE as it is unclear to me what causes it. Running the tests one-by-one does not reproduce it, only when running them all....

`hipFreeAsync` hangs

Also, on Windows there are no issues at all with RX7900XT, it passes all AMDGPU.jl tests without hanging.

`hipFreeAsync` hangs

@iassiour, not sure if this is expected, but I noticed that async malloc/free vs non-async is ~300x slower (tried on RX6700 XT and RX7900 XT). MWE: ```cpp #include #include using...

`hipFreeAsync` hangs

Indeed, smaller than 8 bytes allocations are much slower. Thanks! However, with e.g. 16 bytes it is still 3-5x slower: ``` pxl-th@Leleka:~/code$ time ./a.out Regular real 0m0,255s user 0m0,203s sys...

`hipFreeAsync` hangs

Thank you for the fix! Regarding `hipFreeAsync` and hangs, I recently upgraded to ROCm 6 and when running AMDGPU.jl tests it reported some page faults (and errored instead of hanged),...

`hipFreeAsync` hangs

There are tests that reliably trigger the hang. In Julia we use Task-Local State (TLS) as opposed to Thread-Local State. And each Task in Julia has its own HIP stream,...

`hipFreeAsync` hangs

Reviving this as I have a fairly small MWE that consistently reproduces the issue. On ROCm 6.0.2 and RX7900 XTX. Again in Julia as it is much easier to set...

‹
1
2
...
5
6
7
8
9
10
11
...
21
22
›