transformers
transformers copied to clipboard
Failed to dump torchscript model for GPT2
System Info
python version, 3.7 transformers version, 4.26.1
Who can help?
@ArthurZucker, @younesbelkada
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
model_inputs = dict(
input_ids=torch.zeros((1, 1024), dtype=torch.long).cuda(),
attention_mask=torch.ones((1, 1024), dtype=torch.long).cuda())
model = GPT2LMHeadModel.from_pretrained(args.model, torchscript=True).eval().cuda()
def dict_test(example_inputs: Dict[str, torch.Tensor]):
return model(input_ids=example_inputs['input_ids'], attention_mask=example_inputs['attention_mask'])
model_scripted = torch.jit.trace(dict_test, model_inputs)
torch.jit.save(model_scripted, "traced_bert.pt")
I used the above code to generate GPT2 torchscript model and got error as follows:
RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient
when running to
"./transformers/models/gpt2/modeling_gpt2.py", line 830, in forward
inputs_embeds = self.wte(input_ids)
Expected behavior
generate GPT2 torchscript model.
Hey! Thanks for reporting. This is indeed a bug, will see what we can do to fix that!
Hi @zhuango,
I think your problem is more related to how you trace the model rather than the transformers library itself. Since you're tracing a function (not the model itself), JIT trace knows nothing about model parameters, but instead sees them as unnamed tensors that take part in the forward pass calculations. As the origins of these tensors are unknown, it cannot build an autograd chain for them, but since those tensors have autograd enabled, it shows this error.
So, I see the following ways you could solve this:
- Disable autograd for all model parameters before tracing:
model.requires_grad_(False)
- Transform your
dict_test
function into a model that wraps the original model and trace it (this way JIT will discover model parameters and corresponding tensors and will be able to use autograd for them):
class DictModel(torch.nn.Module):
def __init__(self, model):
super().__init__()
self.model = model
def forward(self, inputs: Dict[str, torch.Tensor]):
return self.model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
dict_model = DictModel(model)
model_scripted = torch.jit.trace(dict_model, inputs)
- Just trace the model itself sending input parameters as a tuple instead of a dict (but I guess you intentionally want to use a dict to make the resulting torchscript usage easier?):
torch.jit.trace(model, ...)
@ArthurZucker, let me know if you think this needs any additions to the library itself or documentation?
Hi @vvmnnnkv, thanks a lot. That works for me.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.