pytorch
pytorch copied to clipboard
HuggingFace DebertaForQuestionAnswering, DebertaForMaskedLM: The tensor has a non-zero number of elements
🐛 Describe the bug
RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory
is caused by this:
import torchdynamo
import torch
def forward():
ones = torch.ops.aten.ones.default([4, 512], device = torch.device(type='cuda', index=0), pin_memory = False)
zeros = torch.ops.aten.zeros.default([4, 512], dtype = torch.int64, device = torch.device(type='cuda', index=0), pin_memory = False)
return (ones, zeros)
f = torchdynamo.optimize(backend="nvprims_nvfuser")(forward)
f()
Versions
torchbenchPerf branch + https://github.com/IvanYashchuk/torchdynamo/tree/nvfuser-cudagraphify