eqy
eqy
Prototype version with "cuDNN conv style" caching.
Fix for this issue surfaced from the discuss forum: https://discuss.pytorch.org/t/cuda-error-cublas-status-not-supported-when-calling-cublasltmatmul-from-torch-nn-functional-linear/170214 Note that PyTorch builds before #71200 should not be affected as there was no `cublasLt` dispatch path. Additionally, the provided...
Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs.
Support for nonblocking NCCL communicators/fault tolerance/checking which was added in 2.14 as an experimental feature. Enabled via the environment variable: ``` NCCL_USE_COMM_NONBLOCKING=1 ``` CC @ptrblck
Hopefully addresses the failure seen when trying to bump to 1.1.0 (#119642) CC @Skylion007 cc @csarofeen @ptrblck @xwang233
cuBLAS has indicated that certain kernels will transition to using the driver API over the CUDA runtime API, which we've observed to break existing tests (e.g., DataParallel) that use multithreading...
The current cuDNN submodule is ancient, this shouldn't break the build right?
GNMT won't run without `dllogger`
### 🐛 Describe the bug Unsure why this hasn't surfaced in upstream CI as we've observed it on sm90, sm86, sm80, sm60, ... Doesn't surface on e.g., `numpy==1.24.4` but starts...
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes. What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure...