Szilárd Páll

Results 10 issues of Szilárd Páll

Documentation of the config file as well as full man pages are missing.

**Describe the motivation for the feature request** Currently setting LD_LIBRARY_PATH before launching an application that uses hipSYCL is requires. **Describe the solution you'd like** Not have to set LD_LIBRARY_PATH to...

enhancement

The following change that only does code refectoring of the GROMACS OpenCL kernels causes the OpenCL compiler to crash: https://gerrit.gromacs.org/#/c/7810/19/src/gromacs/mdlib/nbnxn_ocl/nbnxn_ocl_kernel_utils.clh The culprit has been isolated to the linked changes on...

Multiple clFFT tests fail on both Vega10 and Fiji with ROCm 1.9. Repro ingredients ROCm 1.9 ``` $ dpkg -l | grep rocm-opencl ii rocm-opencl 1.2.0-2018090737 amd64 OpenCL/ROCm ii rocm-opencl-dev...

When compiling with `-cl-opt-disable`, I get the following errors (one for each kernel function): ``` : error: can't create dynamic relocation R_AMDGPU_REL32_LO against symbol: norm2 in readonly segment; recompile object...

GROMACS runs that seemed fine before stall and fail to complete since the last ROCm update. Symptoms: with small inputs that run ~100s of microseconds per iteration (one clFinish per...

As rocFFT does not have OpenCL bindings a relatively easy way (as suggested [here](https://github.com/ROCmSoftwarePlatform/rocFFT/issues/120#issuecomment-380488475) would be to load rocFFT binaries with `clCreateProgramWithBinary` to be able to use them in an...

Would like to ideally have atomic_add(); I'm assuming the hardware supports resolving conflicts.

During `gmres_device_solve` there are ilde gaps in the GPU utilization due to (IIUC): * global reductions (following glsc3_reduce_kernel) and * a small cpu kernel https://github.com/ExtremeFLOW/neko/blob/1753fa9e89bd83e52a704523acfa103c0fb0cbc3/src/krylov/bcknd/device/gmres_device.F90#L435 Both of these are preceded...

GPU
performance

I see a number of memset and device-to-device memcopies none of which is overlapped with compute. Based on a Leonardo TGV 256k run there are up to ~3% wall-time spent...

GPU
performance