Torre Zuk

Results 42 comments of Torre Zuk

Sorry last comment fell into a github commit so almost invisible: "Is this available on all OS and container controls? Why not get CI to tell you the limit, Eiden...

Thanks for your report @torrance. rocBLAS supports the equivalent of cublasgemmEx with the function rocblas_gemm_ex described here: https://rocm.docs.amd.com/projects/rocBLAS/en/latest/API_Reference_Guide.html#rocblas-gemm-ex-batched-strided-batched It implements numerous mixed precision and high precision accumulations (HPA) so please...

Sure we can recycle this for request of an equivalent to cublasCherkEx() which is a new feature request. Can ask if @emankov has any insights into cublasgemmEx() hipify mapping to...

@pxl-th thanks for the update. I expect the memory spike will reduce further with later releases.

> Pulling /opt/rocm-4.5.0/include just pollutes all my include paths. I understand your complaint, at a couple meetings I tried unsuccessfully to discourage the switching to #include style in the new...

@dengelt thanks for your input. Have you created this as a general issue for ROCm [ROCm](https://github.com/ROCm/ROCm/issues) ? Or do you only use rocBLAS and would benefit from a pkg-config for...

Well yes @dengelt good to raise it at the bottom of the stack, so hip or even an issue in ROCm/ROCm. Any new system will take a coordinated effort but...

Thanks @dengelt for raising this issue. Your issue in hip https://github.com/ROCm/HIP/issues/3378#issue-2040331584 has led to an internal investigation and a tracked ticket, so any progress on this topic in hip will...

Thanks for your update @jbaileyhandle. I too was concerned about the state of the virtualenv as cmake configure step likely activates and runs a partial processing step for Tensile via...

Not sure what you are trying to build in parallel, but if using all the cores for each stage simpler to debug if steps are done in order. I would...