units
units copied to clipboard
Ablation study on `Multi-way Transformer Decoder`
Dear authors:
First of all, amazing work, and I enjoyed it a lot.
Your paper stated:
In this ablation study, we use a hybrid transformer as a baseline encoder and train both models from scratch with a longer side of the input image of 768, and test them with an image size of 1280.
According to Table 7, the improvement of DET and E2E is minor (in my personal viewpoint only, as ICDAR 2015 and Total-Text are small scale datasets)
Are you happy to share more about this ablation study? E.g.,
1. What is the improvement of Multi-way Decoder
in the final model, i.e.,
- when using the Swin transformer, and
- the image resolution is 1920.
2. The converge curves with/without using Multi-way Decoder
?