units icon indicating copy to clipboard operation
units copied to clipboard

Ablation study on `Multi-way Transformer Decoder`

Open amos-x-wang opened this issue 1 year ago • 0 comments

Dear authors:

First of all, amazing work, and I enjoyed it a lot.

Your paper stated:

In this ablation study, we use a hybrid transformer as a baseline encoder and train both models from scratch with a longer side of the input image of 768, and test them with an image size of 1280.

According to Table 7, the improvement of DET and E2E is minor (in my personal viewpoint only, as ICDAR 2015 and Total-Text are small scale datasets) image


Are you happy to share more about this ablation study? E.g., 1. What is the improvement of Multi-way Decoder in the final model, i.e.,

  • when using the Swin transformer, and
  • the image resolution is 1920.

2. The converge curves with/without using Multi-way Decoder?

amos-x-wang avatar Dec 05 '23 15:12 amos-x-wang