What does this PR do?

This PR contributes to #16308 and addresses #16463 by adding support for exporting Longformer to ONNX.

The following necessary changes were already made:

[x] LongformerOnnxConfig implemented
[x] ONNX opset version >= 12
[x] fix in model definition with nn.functional.pad (see https://github.com/huggingface/transformers/issues/13126#issuecomment-993645323)

However, there are still some open issues I'd need help with:

[x] ~The conversion to ONNX fails when a global_attention_mask is provided that contains at least one 1. It raises the following error: Only consecutive 1-d tensor indices are supported in exporting aten::index_put to ONNX.. So far, I have been unable to track down which line triggers this error. If we find it, we can probably rewrite the model implementation using this workaround: https://pytorch.org/docs/stable/onnx.html#writes-sets~ → issue resolved by rewriting accesses
[x] ~The validation check currently fails with a high value difference (3.77). The JIT conversion raises the following warnings. Maybe some of them are the reasons for it:~ → tracked down and fixed

/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1569: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 if padding_len > 0:
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1256: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 is_global_attn = is_index_global_attn.flatten().any().item()
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:569: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert (
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:805: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert (
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert query.size() == key.size()
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:598: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert list(attn_scores.size()) == [
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:873: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert seq_len % (window_overlap * 2) == 0
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:874: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert attn_probs.size()[:3] == value.size()[:3]
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:875: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert attn_probs.size(3) == 2 * window_overlap + 1
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:669: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 assert attn_output.size() == (batch_size, seq_len, self.num_heads, self.head_dim), "Unexpected size"
/Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1312: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
 if padding_len > 0:

Before submitting

[x] Did you read the contributor guideline, Pull Request section?
[x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case: #16308, #16463
[x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[x] Did you write any new necessary tests? → default Longformer and ONNX tests

Who can review?

Maybe @ChainYo and/or @lewtun can help with this? 😊

May 11 '22 11:05 deutschmn

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

May 11 '22 12:05 HuggingFaceDocBuilderDev

Hey :hand: excellent PR, the code looks just fine!

I wonder if you tried to specify the right --feature while converting your LongFormer model? Which model did you try and what --feature did you choose?

May 11 '22 12:05 chainyo

Hey ✋ excellent PR, the code looks just fine!

Thanks!

I wonder if you tried to specify the right --feature while converting your LongFormer model? Which model did you try and what --feature did you choose?

I'm currently experimenting with longformer-base-4096. The reported difference of 3.77 is with --feature=default, but there are large differences with all other features as well (masked-lm: 14.1, sequence-classification: 0.04, question-answering: 0.25, token-classification: 0.19, multiple-choice: 0.1).

May 11 '22 13:05 deutschmn

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 10 '22 15:06 github-actions[bot]

Hey @deutschmn, did you finally get good results with Longformer?

Jun 19 '22 16:06 chainyo

@ChainYo Unfortunately, I didn't get a chance to dive in further yet. I'll try to find some time, but if someone else has any ideas, please let me know.

Jun 20 '22 07:06 deutschmn

Hey @ChainYo! I found some time and fixed the issues. Can we reopen? 😊

Adding support for the global_attention_mask was pretty easy after I tracked down the unsupported indexing lines, but it took quite a deep dive to find out where the value difference came from. There were two main issues:

masked_fill_ produces different results when converting to ONNX. I replaced it with a simple where.
as_strided for chunking doesn't work either, presumably because it relies on the underlying memory layout that's different in ONNX. The perfect solution would be to use unfold, but unfortunately, that op is not supported. So I added a slow fallback that works in every case. Once there's support for unfold, we can get rid of that.

Jun 30 '22 09:06 deutschmn

Hey @ChainYo! I found some time and fixed the issues. Can we reopen?

Hey, thanks for iterating on this. I will ping @lewtun to open this again.

Jun 30 '22 09:06 chainyo

Thanks a lot for re-working on this @deutschmn ❤️ ! Ping me when you'd like a review :)

Jun 30 '22 11:06 lewtun

Thanks for reopening, @lewtun. Would be brilliant if you could review now 😊

Jun 30 '22 15:06 deutschmn

Thanks for your reviews, @lewtun and @patrickvonplaten 😊 I worked in all your feedback and added Longformer to the ONNX tests. Slow ONNX + Longformer tests seem to work fine:

RUN_SLOW=1 pytest tests/models/longformer/test_modeling_longformer.py → 55 passed, 10 skipped, 14 warnings

=================================================================== test session starts ===================================================================
platform darwin -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/patrick/Projects/open-source/transformers, configfile: setup.cfg
plugins: xdist-2.5.0, hypothesis-6.46.3, forked-1.4.0, timeout-2.1.0, dash-2.4.1
collected 65 items                                                                                                                                        

tests/models/longformer/test_modeling_longformer.py ...s.sss..................... [100%]

============================= warnings summary =============================
tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/Projects/open-source/transformers/src/transformers/image_utils.py:222: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
   def resize(self, image, size, resample=PIL.Image.BILINEAR, default_to_square=True, max_size=None):

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torchvision/transforms/functional_pil.py:228: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
   interpolation: int = Image.BILINEAR,

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torchvision/transforms/functional_pil.py:295: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
   interpolation: int = Image.NEAREST,

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torchvision/transforms/functional_pil.py:311: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
   interpolation: int = Image.NEAREST,

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torchvision/transforms/functional_pil.py:328: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
   interpolation: int = Image.BICUBIC,

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/auto_augment.py:39: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
   _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/auto_augment.py:39: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
   _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/transforms.py:39: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
   Image.NEAREST: 'nearest',

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/transforms.py:40: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
   Image.BILINEAR: 'bilinear',

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/transforms.py:41: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
   Image.BICUBIC: 'bicubic',

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/transforms.py:42: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
   Image.BOX: 'box',

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/transforms.py:43: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
   Image.HAMMING: 'hamming',

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/timm/data/transforms.py:44: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
   Image.LANCZOS: 'lanczos',

tests/models/longformer/test_modeling_longformer.py::LongformerModelTest::test_training_gradient_checkpointing
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
   warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 55 passed, 10 skipped, 14 warnings in 86.62s (0:01:26) ==========

RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py -k "longformer" → 12 passed, 377 deselected, 228 warnings

=========================================================================================== test session starts ===========================================================================================
platform darwin -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/patrick/Projects/open-source/transformers, configfile: setup.cfg
plugins: xdist-2.5.0, hypothesis-6.46.3, forked-1.4.0, timeout-2.1.0, dash-2.4.1
collected 389 items / 377 deselected / 12 selected                                                                                                                                                        

tests/onnx/test_onnx_v2.py ............                                                                                                                                                             [100%]

============================================================================================ warnings summary =============================================================================================
tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1610: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   if padding_len > 0:

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torch/_tensor.py:627: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   return self.item().__format__(format_spec)

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torch/nn/functional.py:2165: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert padding_idx < weight.size(0), "Padding_idx must be within num_embeddings"

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1297: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   is_global_attn = is_index_global_attn.flatten().any().item()

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:565: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert (

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:832: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert (

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:835: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert query.size() == key.size()

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:785: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   if hidden_states.size(1) == window_overlap * 2:

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:594: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert list(attn_scores.size()) == [

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:900: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert seq_len % (window_overlap * 2) == 0

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:901: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert attn_probs.size()[:3] == value.size()[:3]

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:902: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert attn_probs.size(3) == 2 * window_overlap + 1

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:668: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert attn_output.size() == (batch_size, seq_len, self.num_heads, self.head_dim), "Unexpected size"

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1072: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert list(global_attn_scores.size()) == [

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1122: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   assert list(global_attn_output.size()) == [

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:691: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
   len(is_local_index_global_attn_nonzero[0]), -1

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/Projects/open-source/transformers/src/transformers/models/longformer/modeling_longformer.py:1353: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   if padding_len > 0:

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torch/onnx/symbolic_helper.py:719: UserWarning: allowzero=0 by default. In order to honor zero value in shape use allowzero=1
   warnings.warn("allowzero=0 by default. In order to honor zero value in shape use allowzero=1")

tests/onnx/test_onnx_v2.py: 12 warnings
 /Users/patrick/.pyenv-x86/versions/3.9.10/envs/transformers-x86_64/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:2905: UserWarning: Exporting aten::index operator of advanced indexing in opset 14 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
   warnings.warn("Exporting aten::index operator of advanced indexing in opset " +

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== 12 passed, 377 deselected, 228 warnings in 3599.78s (0:59:59) ======================================================================

Jul 08 '22 07:07 deutschmn

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Aug 01 '22 15:08 github-actions[bot]

I merged main into this branch to resolve conflicts. Gently pinging @lewtun and @patrickvonplaten for a re-review 😊

Aug 02 '22 06:08 deutschmn

Hey @ChainYo! I found some time and fixed the issues. Can we reopen? 😊

Adding support for the global_attention_mask was pretty easy after I tracked down the unsupported indexing lines, but it took quite a deep dive to find out where the value difference came from. There were two main issues:

masked_fill_ produces different results when converting to ONNX. I replaced it with a simple where.

as_strided for chunking doesn't work either, presumably because it relies on the underlying memory layout that's different in ONNX. The perfect solution would be to use unfold, but unfortunately, that op is not supported. So I added a slow fallback that works in every case. Once there's support for unfold, we can get rid of that.

Hi @deutschmn, thanks for contributing! As for the tracing problem of masked_fill_ and as_strided, they are both supported in torch.onnx.symbolic_opset9, have you tried interpreting the forward pass of LongformerSelfAttention with a symbolic method to apply the symbolic tracing?

REF

Symbolic doc in PyTorch
An example: how it was done for DeBERTa

https://github.com/huggingface/transformers/blob/df28de0581aaf6d8742c4988137caac2b6602ca8/src/transformers/models/deberta/modeling_deberta.py#L122-L137

Aug 05 '22 09:08 JingyaHuang

Hey @JingyaHuang, thanks for your feedback 😊 I haven't looked into symbolic tracing yet. I'm travelling right now, but I'll have another look when I'm back in a couple of weeks.

Aug 08 '22 11:08 deutschmn

transformers
transformers copied to clipboard

Add ONNX support for Longformer

What does this PR do?

Before submitting

Who can review?

transformers transformers copied to clipboard

Add ONNX support for Longformer

What does this PR do?

Before submitting

Who can review?

transformers
transformers copied to clipboard