unilm icon indicating copy to clipboard operation
unilm copied to clipboard

[LayoutReader] Training loss is low but inference performs terrible

Open Mountchicken opened this issue 1 year ago • 5 comments

Describe Model I am using (UniLM, MiniLM, LayoutLM ...): LayoutReader

Hi @zlwang-cs I am using layoutreader to predict layout-only data like this, which the order is from right to left, top to bottom. 12_MTH_1

However, I encountered some problems and hope you can share some insights.

  • When I tried to train the model, an error occurs at line 671, said that the two summed tensor dimensions are not aligned, and when I remove the self.bias, I can train normally https://github.com/microsoft/unilm/blob/cd2eb8ade8b6e475aefa9b769ced2eefc4245a3e/layoutreader/s2s_ft/modeling.py#L669-L673
  • The training process is normal, and the initial loss is around 30 and can drop to 0.01. However, when I test with thetest set (shuffle rate=1.0), the results can be very poor. A lot of boxes are missing and the ARD is around 20.1. When I pre-sort the inputs in the test set with rules (simply orderd by x coordinate), and feed them into the network. But the network just predicts the exact same reading order as the inputs. What's more, when I tested with the training set, the results were also very poor

Mountchicken avatar Aug 15 '22 01:08 Mountchicken

Hi, thanks for your interest in our paper. I'd love to help you to fix the issues.

For the first question, I am not sure what the problem is without any more detailed information. I would recommend you to use stop points or other debugging tools to see what the shape of tensors looks like in this step.

For the second question, I guess the problem may be from the reading order of your dataset. You can see that the data you use is quite different from the original settings in our paper. The pre-trained weight is based on the left-to-right and top-to-bottom reading order so such a pre-training setting may be an obstacle in your experiment. Considering this, I don't think the poor performance is surprising. If you would like to continue solving this reading order setting, maybe you need to collect enough data to pre-train the model again or resort to other approaches. Another possible way is to turn the image 90 degrees counterclockwise so that it will be similar to the common reading order settings.

zlwang-cs avatar Aug 15 '22 02:08 zlwang-cs

Hi @zlwang-cs Tks for the prompt reply. I'll try to debug this and see if I can locate the problem. Rotate the image 90 degrees counterclockwise seems to be quite reasonable and I'll try this too.

There is one thing that still puzzles me. You mentioned that LayoutReader loads pre-trained weights during training. Is this pre-trained weight based on word-level or textline-level? It seems that I need textline-level pre-trained weight here

Mountchicken avatar Aug 15 '22 02:08 Mountchicken

Hi @Mountchicken, the pre-trained model is based on the word-level. Unfortunately, I cannot help you with the textline-level pre-training.

zlwang-cs avatar Aug 15 '22 03:08 zlwang-cs

Hi @zlwang-cs Thanks for the reply. BTW, how to load the pre-trained weights and finetune it on my own dataset? I downloaded layoutreader-base-readingbank.zip from this link and got config.json, pytorch_model.bin after unpacking it.

Which arg should I assign below

python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \
    --model_type layoutlm \
    --model_name_or_path layoutlm-base-uncased \
    --train_folder /path/to/ReadingBank/train \
    --output_dir /path/to/output/LayoutReader/layoutlm \
    --do_lower_case \
    --fp16 \
    --fp16_opt_level O2 \
    --max_source_seq_length 513 \
    --max_target_seq_length 511 \
    --per_gpu_train_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --learning_rate 7e-5 \
    --num_warmup_steps 500 \
    --num_training_steps 75000 \
    --cache_dir /path/to/output/LayoutReader/cache \
    --label_smoothing 0.1 \
    --save_steps 5000 \
    --cached_train_features_file /path/to/ReadingBank/features_train.pt

Mountchicken avatar Aug 15 '22 06:08 Mountchicken

Hi @Mountchicken , I assume weight loading is quite common in practice. Please refer to some related documents and I am sure you can find the right answer. And I see you are using the run_seq2seq.py which is for training, but the weight you downloaded is actually for decoding. I guess that is the reason why you are confused.

zlwang-cs avatar Aug 23 '22 20:08 zlwang-cs