Ivan Yashchuk issues

Results 27 issues of


Ivan Yashchuk

Use [email protected] for FEniCS CI

Update registration of custom derivatives in JAX

See https://github.com/IvanYashchuk/jax-fenics-adjoint/issues/8

broadcast_in_dim: The size of contiguity must equal to the number of non-broadcasting IterDomains

### 🐛 Describe the bug ```py from nvfuser import FusionDefinition, DataType import torch def nvfuser_fusion_id2(fd : FusionDefinition) -> None : T0 = fd.define_tensor(symbolic_sizes=[-1, 1], contiguous=[True, True], dtype=DataType.Double, is_cpu=False) T1 =...

nearbyint fails with NVRTC compile error for integer inputs

### 🐛 Describe the bug ```py from nvfuser import FusionDefinition, DataType import torch with FusionDefinition() as fd: t1 = fd.define_tensor(symbolic_sizes=[-1], contiguous=[True], dtype=DataType.Int32) t2 = fd.ops.round(t1) fd.add_output(t2) a = torch.ones(2, device="cuda",...

Silent wrong result with multiple aliasOutputToInput

### 🐛 Describe the bug The second call to `aliasOutputToInput(source, dest)` is ignored unless tv4 in the example is added to outputs. Here's an initial version of the C++ test...

Alias

Worse performance than ATen: aten.any.default

### 🐛 Describe the bug # aten.any aten.any is used in AllenaiLongformerBase, BartForConditionalGeneration, BlenderbotSmallForConditionalGeneration, M2M100ForConditionalGeneration, MBartForConditionalGeneration, PLBartForConditionalGeneration, PegasusForConditionalGeneration. Here's the result comparing to ATen: | benchmark | geomean | 20th...

Worse performance than ATen: aten._log_softmax_backward_data

### 🐛 Describe the bug # aten._log_softmax_backward_data.default Here's the result comparing to ATen: | benchmark | geomean | 20th percentile | 50th percentile | 80th percentile | |-------------|---------|-----------------|-----------------|-----------------| | HuggingFace...

Worse performance than ATen: aten._log_softmax

### 🐛 Describe the bug # aten._log_softmax.default Here's the result comparing to ATen: | benchmark | geomean | 20th percentile | 50th percentile | 80th percentile | |-------------|---------|-----------------|-----------------|-----------------| | HuggingFace...

HuggingFace DebertaForQuestionAnswering, DebertaForMaskedLM: The tensor has a non-zero number of elements

### 🐛 Describe the bug ```py RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will...

Inconsistent parallelization found with series of trivial reductions

### 🐛 Describe the bug This code is extracted from a portion of [`torch._decomp.decompositions.native_batch_norm`](https://github.com/pytorch/pytorch/blob/b6d6a78c12e5869d0c738456e28155a3a2554ece/torch/_decomp/decompositions.py#L1113-L1114) lowered to nvprims and manually translated to C++, maybe there's even more minimal code that fails,...