Natalia Gimelshein

Results 11 issues of Natalia Gimelshein

See https://github.com/pytorch/pytorch/issues/71465 Slightly changes LayerNorm2d implementation, 1) currently when ln2d is called on a contiguous tensor, it accidentally turns most of the network into channels last mode, line 114 undoes...

Original PixelCNN paper (https://arxiv.org/pdf/1606.05328.pdf) uses gated unit defined as tanh(a) * sigmoid(b). Same formulation of gated unit is used in the wavenet paper. Yet here you switched to gated unit...

We are getting issues like https://github.com/pytorch/pytorch/issues/90170 and we'll be getting more of them, where people try to use new hardware (4090 in this case) with old toolkit (and hence old...

enhancement

Repro: ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile()...

bug

Repro: ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile()...

bug

Repro below. The same kernel is called with AOT and jit compilation, and AOT produces wrong result. The mask, instead of having a single True element (it's computed as `...

bug

Repro below. Generated ptx looks valid in both cases, with only difference in movs with `@!pxx` as expected . Happens with fp16, float32 is ok. I'm deliberately setting `other` to...

bug

This might be related to #714. Repro (comments inside, requires torchdynamo unfortunately), tl;dr if the kernel has `xnumel=` where `xnumel` is also a kernel arg, and is equal to the...

This matters only for operands of the different sign that are not exactly divisible. repro ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided...

bug

Fixes #91758 I'm not in love with casting cudaError to and from int, but I couldn't avoid it without major refactors, and we need to fix this bug soon.