transformers
                                
                                 transformers copied to clipboard
                                
                                    transformers copied to clipboard
                            
                            
                            
                        Transformer XL training fails because of IndexError due to change in ModuleList for torch>1.11
System Info
Transformer version- 4.24 Torch version> 1.11
Stacktrace:
venv/lib/python3.8/site-packages/transformers/models/transfo_xl/modeling_transfo_xl.py:1115: in forward
    softmax_output = self.crit(pred_hid, labels)
venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1190: in _call_impl
    return forward_call(*input, **kwargs)
venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1178: in _slow_forward
    result = self.forward(*input, **kwargs)
venv/lib/python3.8/site-packages/transformers/models/transfo_xl/modeling_transfo_xl_utilities.py:134: in forward
    head_weight, head_bias, head_proj = weights[0], biases[0], self.out_projs[0]
venv/lib/python3.8/site-packages/torch/nn/modules/container.py:282: in __getitem__
    return self._modules[self._get_abs_string_index(idx)]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = ModuleList(), idx = 0
    def _get_abs_string_index(self, idx):
        """Get the absolute index for the list of modules"""
        idx = operator.index(idx)
        if not (-len(self) <= idx < len(self)):
>           raise IndexError('index {} is out of range'.format(idx))
E           IndexError: index 0 is out of range
venv/lib/python3.8/site-packages/torch/nn/modules/container.py:272: IndexError
Please do let me know if further info is required.
Who can help?
@patrickvonplaten
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the examplesfolder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)
Reproduction
Use generic torch src_token as input with d_model=d_embed with torch>1.11
Expected behavior
Should work with different torch versions
Thanks for reporting but could you give us a short reproducer as our CI didn't catch any regression here?
Thanks for reporting but could you give us a short reproducer as our CI didn't catch any regression here?
I run it as a part of fairseq. This test case-https://github.com/facebookresearch/fairseq/blob/main/tests/test_binaries.py#L1319 also fails due to same reason. IIUC, in the fairseq case d_embed=d_model maybe this condition is required to reproduce the issue?
That's not exactly a small reproducer we can run on our side ;-)
Can you point me to the test case that tests the training of the transformer XL model in huggingface? Maybe I can tune the parameters accordingly to reproduce the issue
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
actually this is still a problem. Can you please try by setting the params d_embed and d_model iwith same value?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.