lightseq [training] fail to run the huggingface example when batch size as 1.

Hello, I try to run the example in lightseq/examples/training/huggingface. Because I use a game PC, so I sightly modify the run_ner.sh script (two lines as follows).

python3 -m torch.distributed.launch \
   --nproc_per_node=1 \
   $THIS_DIR/run_ner.py \
-  --model_name_or_path bert-large-uncased \
-  --per_device_train_batch_size 16 \
+  --model_name_or_path bert-base-uncased \
+  --per_device_train_batch_size 1 \
   --dataset_name conll2003 \
   --output_dir /tmp/test-ner \
   --do_train \

The program will crash on this line.

File "/home/user/anaconda3/envs/deepspd/lib/python3.7/site-packages/lightseq/training/ops/pytorch/transformer_encoder_layer.py", line 288, in forward assert bs == encoder_padding_mask.size(0) and sl == encoder_padding_mask.size(1) AssertionError

The software version I used.

transformers 4.11.0 lightseq 2.1.4 torch 1.7.1+cu110 torchaudio 0.7.2 torchvision 0.8.2+cu110

Cuda compilation tools, release 11.1, V11.1.105

Sep 29 '21 03:09 feifeibear

I guess the error comes from setting batch size as 1. If I set the per_device_train_batch_size as 2, It works.

Sep 29 '21 03:09 feifeibear

Thanks, Jiarui. It seems like an assertion bug for 1 batch, we'll fix it. BTW, Are Turbo working on training? I'm looking forward to it.

Sep 29 '21 05:09 Taka152

Thanks, Jiarui. It seems like an assertion bug for 1 batch, we'll fix it. BTW, Are Turbo working on training? I'm looking forward to it.

Haha, thanks for your attention. Turbo will not (or never) support training :). Lightseq did an amazing job on this point. I appreciate your efforts in training acceleration. I test it on Bert training cases and noticed quite a significant speedup.

Sep 29 '21 07:09 feifeibear