Bert Wesarg

Results 91 comments of Bert Wesarg

See [#112](/RadeonOpenCompute/rocm_smi_lib/issues/112)

I'm not sure how I can resolve this. The code itself still looks the [same](https://github.com/ROCm/HIP/blob/310b089cbe21e29961797845dc5ddc30533a9d46/include/hip/hip_runtime_api.h#L114C1-L114C84): ```c size_t totalConstMem; ///< Size of shared memory region (in bytes). ```

@geimer @cfeld @Flamefire please give this some eyes, thanks

probably the same issue I reported here: https://github.com/ROCm-Developer-Tools/roctracer/issues/65

but this workaround is not documented and it should actually be not needed. so I do not consider this solved but I haven't tested if anything has changed in the...

I can still reproduce https://github.com/gcongiu/rocm-issues/tree/main/issue-80 this issue with ROCm-5.7.0 build 19 on a dual MI210 node. I will now update to 5.7.0 build 36 and test again. ``` Memory access...

I got ROCm 5.7.0 build 48, but the error remains: ``` Memory access fault by GPU node-8 (Agent handle: 0x1db02f0) on address 0x7fa08547c000. Reason: Unknown. Aborted (core dumped) ```

My tests did not yet include launching kernels from multiple threads into the same queue. I could extend my own mini test to do this for sure and run it...

I see this on CentOS (yum) and Ubuntu (apt).

OS on our failing nodes: ``` Distributor ID: CentOS Description: CentOS Linux release 7.7.1908 (Core) Release: 7.7.1908 Codename: Core