transformers Failed to dump torchscript model for GPT2

System Info

python version, 3.7 transformers version, 4.26.1

Who can help?

@ArthurZucker, @younesbelkada

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

    model_inputs = dict(
        input_ids=torch.zeros((1, 1024), dtype=torch.long).cuda(), 
        attention_mask=torch.ones((1, 1024), dtype=torch.long).cuda())
    model = GPT2LMHeadModel.from_pretrained(args.model, torchscript=True).eval().cuda()

    def dict_test(example_inputs: Dict[str, torch.Tensor]):
        return model(input_ids=example_inputs['input_ids'], attention_mask=example_inputs['attention_mask'])
    
    model_scripted = torch.jit.trace(dict_test, model_inputs)
    torch.jit.save(model_scripted, "traced_bert.pt")

I used the above code to generate GPT2 torchscript model and got error as follows: RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient when running to

"./transformers/models/gpt2/modeling_gpt2.py", line 830, in forward
    inputs_embeds = self.wte(input_ids)

Expected behavior

generate GPT2 torchscript model.

Mar 14 '23 07:03 zhuango

Hey! Thanks for reporting. This is indeed a bug, will see what we can do to fix that!

Mar 14 '23 08:03 ArthurZucker

Hi @zhuango,

I think your problem is more related to how you trace the model rather than the transformers library itself. Since you're tracing a function (not the model itself), JIT trace knows nothing about model parameters, but instead sees them as unnamed tensors that take part in the forward pass calculations. As the origins of these tensors are unknown, it cannot build an autograd chain for them, but since those tensors have autograd enabled, it shows this error.

So, I see the following ways you could solve this:

Disable autograd for all model parameters before tracing:

model.requires_grad_(False)

Transform your dict_test function into a model that wraps the original model and trace it (this way JIT will discover model parameters and corresponding tensors and will be able to use autograd for them):

class DictModel(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, inputs: Dict[str, torch.Tensor]):
        return self.model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])

dict_model = DictModel(model)
model_scripted = torch.jit.trace(dict_model, inputs)

Just trace the model itself sending input parameters as a tuple instead of a dict (but I guess you intentionally want to use a dict to make the resulting torchscript usage easier?):

torch.jit.trace(model, ...)

@ArthurZucker, let me know if you think this needs any additions to the library itself or documentation?

Mar 29 '23 16:03 vvmnnnkv

Hi @vvmnnnkv, thanks a lot. That works for me.

Mar 30 '23 16:03 zhuango

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 24 '23 15:04 github-actions[bot]

transformers transformers copied to clipboard

Failed to dump torchscript model for GPT2

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard