DocTr training code

Hi，thanks for your great work, and when will you release the training code?

Oct 18 '22 10:10 an1018

Hi, thanks for your attention to our work. We will release the training code after the acceptance of our work DocScanner.

Oct 18 '22 11:10 fh2019ustc

Thanks for your reply, could you tell us your training environment（such as, the number and model of GPU）、the training time of geometric unwarping transformer and illumination correction transformer

Oct 19 '22 08:10 an1018

For geometric unwarping, we use 4 GPUs for training. The training takes about 3 days. For illumination correction, we use 2 GPUs for training. The training takes about 1 day. In fact, we do not conduct hyper-parameter tuning experiments on the batch size, learning rate, and number of GPU.

Oct 21 '22 03:10 fh2019ustc

Thanks for your detailed explanation, and the training GPUs of DocScanner is NVIDIA RTX 2080 Ti GPUs and NVIDIA GTX 1080 Ti GPU, which one is used in DocTr?

Oct 25 '22 03:10 an1018

Hi, for DocTr we use 1080 Ti GPUs. In fact, based on our experience, the category of GPU seems not to affect the performance of our method.

Oct 25 '22 04:10 fh2019ustc

When writing the training code,I have some confusion. 1）Before training the GeoTr module, the background needs to be removed. Is it handled by the pre-trained model of the Segmentation module?

2）And after removing the background，the result looks like the image on the right?

3）But in DocScanner, is ground truth mask the result of document localization module? If yes, Why does it say groud truth?

Nov 08 '22 14:11 an1018

Thanks for your attention to our work.

To train the segmentation module, we remove the noisy backgrounds using the GT masks rather than the pre-trained segmentation module. This is the same for our DocTr and DocScanner.
You can also upsample the mask to the original resolution as the input image and then multiply them at the original resolution.
See the A1.

Hope this helps.

Nov 08 '22 15:11 fh2019ustc

Is there any reference code? And what does GT masks represent in the doc3d dataset?

Nov 09 '22 01:11 an1018

In fact, it is easy to extract the GT mask of the document image from other annotations. For example, in UV map, the values of the background region are 0.

Nov 09 '22 03:11 fh2019ustc

@fh2019ustc I've written the training code, but the model does not converge. I'vd send the code to your email（[email protected]）, could you look at the code？Thanks very much.

Nov 16 '22 14:11 an1018

@an1018 So, have you reproduced it successfully with your own training code?

Apr 15 '23 01:04 Aiden0609

@an1018 So have you successfully written your own training code?

Apr 11 '24 10:04 minhduc01168

In fact, it is easy to extract the GT mask of the document image from other annotations. For example, in UV map, the values of the background region are 0.

In other types of the Doc3D, take Backward Mapping for example, the values of the background region are also 0 value? When Loss function compare the Ground Truth backward map "bm_gt" and the output "bm_tr" of DocTr, do I need to remove the backgound in bm_gt? Thank you!

Jan 15 '25 13:01 zhaolitc