eqy issues

Results 23 issues of

eqy

NVFuser JIT eager mode InstanceNorm3d

Prototype version with "cuDNN conv style" caching.

[cublas][cublasLt] Fall back to unfused `addmm` for 2-byte-aligned inputs

Fix for this issue surfaced from the discuss forum: https://discuss.pytorch.org/t/cuda-error-cublas-status-not-supported-when-calling-cublasltmatmul-from-torch-nn-functional-linear/170214 Note that PyTorch builds before #71200 should not be affected as there was no `cublasLt` dispatch path. Additionally, the provided...

open source

ciflow/trunk

topic: not user facing

[CUDA][CUDA 11] Remove more CUDA 11 version checks

Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs.

triaged

open source

ciflow/trunk

topic: not user facing

[NCCL] Add experimental Nonblocking NCCL Fault Tolerance/Checking

Support for nonblocking NCCL communicators/fault tolerance/checking which was added in 2.14 as an experimental feature. Enabled via the environment variable: ``` NCCL_USE_COMM_NONBLOCKING=1 ``` CC @ptrblck

triaged

open source

ciflow/trunk

release notes: distributed (c10d)

topic: not user facing

[cuDNN] Bump cuDNN frontend submodule to 1.1.1

Hopefully addresses the failure seen when trying to bump to 1.1.0 (#119642) CC @Skylion007 cc @csarofeen @ptrblck @xwang233

module: cudnn

open source

ciflow/trunk

topic: not user facing

ciflow/periodic

[CUDA][cuBLAS] Check if a context is present when grabbing a cuBLAS handle

cuBLAS has indicated that certain kernels will transition to using the driver API over the CUDA runtime API, which we've observed to break existing tests (e.g., DataParallel) that use multithreading...

module: cuda

module: cublas

open source

topic: not user facing

[CUDNN][cudnn-frontend] Bump cuDNN to 1.0.3

The current cuDNN submodule is ancient, this shouldn't break the build right?

Add `dllogger` to `requirements.txt`

GNMT won't run without `dllogger`

[CUDA][Compex] `test_reference_numerics_large_jiterator_unary_cuda_complex64` broken after updating to `numpy >= 1.25.0`

### 🐛 Describe the bug Unsure why this hasn't surfaced in upstream CI as we've observed it on sm90, sm86, sm80, sm60, ... Doesn't surface on e.g., `numpy==1.24.4` but starts...

module: cuda

module: complex

module: numpy

module: jiterator

[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80

Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes. What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure...

triaged

open source

Merged

Reverted

ciflow/trunk

ciflow/periodic

release notes: cudnn

module: inductor

module: dynamo

ciflow/inductor

merging

ciflow/rocm