Rowel Atienza

Results 36 comments of Rowel Atienza

vitstr loads rgb but converts it to grayscale before processing. https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/dataset.py#L210 There is an option to use an rgb image as is: https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/dataset.py#L207

The pre-trained weights were used since transformers do not have inductive bias. However, for the case of STR, since MJSynth and SynText are both big in number (though lacking in...

Hi, I do not have a set of instructions to train ViTSTR for non-latin chars. Training on non-latin requires: 1) Labelled train/test dataset in lmbd format 2) Change the number...

It should run on even on a single GPU. For instance, running the same script, the memory consumption is: `| 3 N/A N/A 125879 C python3 9087MiB |`

Hi, The resized images (224x224) are still human readable. The attention maps on square images also appear to be giving proper weights on each character region. Other than these, there...

Hi, thanks for using ViTSTR. If the validation accuracy is 99.9%, something is wrong. Unfortunately, I could not find the demo script I used for ViTSTR but it is simply...

Thanks. Normalization was not part of the CLOVA AI training/eval protocol that we used. So, we did not try normalization. We just reproduced their results and followed the same protocol...

You might want to train fr scratch (instead of a pre-trained ViT) if you have access to a big train dataset. In such cases, you can train without resizing the...

There are various techniques that can be used to improve the performance of transformer based models. The simplest is training on a large dataset to overcome the lack of inductive...