pytorch
pytorch copied to clipboard
Enable gesvda and fix geqrf
Fixes SWDEV-407984 and SWDEV-392430
According to Math Library team, it is expected behavior to return error when batch_count == 0. Hence I'm making the temporary workaround permanent.
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose --use-pytest -i test_linalg.py
shows
FAILED [0.0052s] test_linalg.py::TestLinalgCUDA::test_linalg_lstsq_batch_broadcasting_cuda_complex128
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
======== 1 failed, 233 passed, 39 skipped, 2 rerun in 96.84s (0:01:36) =========```
@xinyazhang Could you please add a comment to this PR showing test_linalg.py TestLinalgCUDA.test_svd_* passes for each data type? Make sure to force the gesvda
path to verify the new gesvda drivers are indeed being called. Thank you!
Hi @alugorey do you know how to enable the logging of rocSOLVER? I've tried ROCSOLVER_LAYER=7 ROCSOLVER_LEVELS=99 ROCSOLVER_LOG_TRACE_PATH=t ROCSOLVER_LOG_BENCH_PATH=b ROCSOLVER_LOG_PROFILE_PATH=p
but it doesn't work
Okay I've fixed the enabling of the gesvda
but apparently the U and Vh are incorrect, even though the Sigmas are right.
I'll handle over this to math library team after confirmation with rocSOLVER.
Hi @alugorey do you know how to enable the logging of rocSOLVER? I've tried
ROCSOLVER_LAYER=7 ROCSOLVER_LEVELS=99 ROCSOLVER_LOG_TRACE_PATH=t ROCSOLVER_LOG_BENCH_PATH=b ROCSOLVER_LOG_PROFILE_PATH=p
but it doesn't work
hi @xinyazhang , sorry i'm just catching up with email. I've only worked with ROCBLAS_LAYER=N. I'd assume this is what you need as rocSOLVER is just a thin wrapper around rocBLAS. You can find more details here: https://confluence.amd.com/pages/viewpage.action?spaceKey=~pensun&title=Collect+unique+rocBLAS+and+MIOpen+configs+from+application
Just to document, in order to enable ROCSOLVER logging mechanism rocsolver_log_begin
is needed.
The following python code makes it possible to enable ROCSOLVER logging without re-compiling torch
#!/usr/bin/env python
from cffi import FFI
ffi = FFI()
ffi.cdef("""
typedef enum rocblas_status_
{
rocblas_status_success = 0, /**< Success */
rocblas_status_invalid_handle = 1, /**< Handle not initialized, invalid or null */
rocblas_status_not_implemented = 2, /**< Function is not implemented */
rocblas_status_invalid_pointer = 3, /**< Invalid pointer argument */
rocblas_status_invalid_size = 4, /**< Invalid size argument */
rocblas_status_memory_error = 5, /**< Failed internal memory allocation, copy or dealloc */
rocblas_status_internal_error = 6, /**< Other internal library failure */
rocblas_status_perf_degraded = 7, /**< Performance degraded due to low device memory */
rocblas_status_size_query_mismatch = 8, /**< Unmatched start/stop size query */
rocblas_status_size_increased = 9, /**< Queried device memory size increased */
rocblas_status_size_unchanged = 10, /**< Queried device memory size unchanged */
rocblas_status_invalid_value = 11, /**< Passed argument not valid */
rocblas_status_continue = 12, /**< Nothing preventing function to proceed */
rocblas_status_check_numerics_fail
= 13, /**< Will be set if the vector/matrix has a NaN/Infinity/denormal value */
rocblas_status_excluded_from_build
= 14, /**< Function is not available in build, likely a function requiring Tensile built without Tensile */
rocblas_status_arch_mismatch
= 15, /**< The function requires a feature absent from the device architecture */
} rocblas_status;
rocblas_status rocsolver_log_begin();
""")
C = ffi.dlopen('/opt/rocm/lib/librocsolver.so.0.1.60000')
C.rocsolver_log_begin()
This is a hipSOLVER problem, tracked by https://ontrack-internal.amd.com/browse/SWDEV-421983 Will re-test this again after hipSOLVER being fixed.