ModernBERT icon indicating copy to clipboard operation
ModernBERT copied to clipboard

`IndexError: tuple index out of range` in `input_ids` during pre-training

Open ebrarkiziloglu opened this issue 11 months ago • 3 comments

Following the instructions in the README, I am working on pre-training from scratch. I ran training with the composer framework using the yamls/main/flex-bert-base.yaml config and with the c4 dataset ./my-copy-c4. [note that I verified that the dataloader works fine following the instructions.]

However, I encountered the following error during training:

IndexError: tuple index out of range

  File "/.../ModernBERT/src/bert_layers/embeddings.py", line 153, in forward
    position_ids = self.position_ids[:, 0 : input_ids.shape[1]]
                                            ~~~~~~~~~~~~~~~^^^

Steps to Reproduce

  1. Prepare the c4 dataset.
  2. Set up the conda environment per instructions.
  3. Run training with composer main.py yamls/main/flex-bert-base.yaml
  4. The error occurs during training in bert_layers/embeddings.py

ebrarkiziloglu avatar Feb 13 '25 09:02 ebrarkiziloglu

Sorry for the delay, I'll try to have a look at this.

NohTow avatar Feb 24 '25 16:02 NohTow

Hello again,

I just tested and indeed, there is an issue in flex-bert-base.yaml. Those configurations are outdated anyways and you should be able to run your tests by using those configurations ; make sure to change the path of the dataset (and I also set streaming to False as well as sequence_packing). We should merge this branch and update the readme (as well as remove the useless/outdated configurations) ASAP to avoid such issue, sorry about that.

(FWIU, copying the model_config of ModernBERT into the old config worked, I think it's because we don't use positional encoding anymore. I won't debug much more as we are deprecating those).

cc @warner-benjamin

NohTow avatar Feb 24 '25 16:02 NohTow

Hi, thank you for your comment. We saw them and started to use them but we hit a wall again, which we solved by fixing some part of the code.

Can you also look at PR #205 , this PR was also necessary for us to move on.

onurgu avatar Feb 24 '25 21:02 onurgu