Rowel Atienza comments

Results 36 comments of


                                            Rowel Atienza

about input channels of vitstr

vitstr loads rgb but converts it to grayscale before processing. https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/dataset.py#L210 There is an option to use an rgb image as is: https://github.com/roatienza/deep-text-recognition-benchmark/blob/ea0d07737e334a97aa0a7df9af3118f85a2b49c2/dataset.py#L207

when i follow train.sh: line 6: --SequenceModeling: command not found

apt-get install libmagickwand-dev

Training from scratch, w/o using Pretrained DeiT?

The pre-trained weights were used since transformers do not have inductive bias. However, for the case of STR, since MJSynth and SynText are both big in number (though lacking in...

Training from scratch, w/o using Pretrained DeiT?

Hi, I do not have a set of instructions to train ViTSTR for non-latin chars. Training on non-latin requires: 1) Labelled train/test dataset in lmbd format 2) Change the number...

CUDA out of memory.

It should run on even on a single GPU. For instance, running the same script, the memory consumption is: `| 3 N/A N/A 125879 C python3 9087MiB |`

Is the network suit for long-text recognition?

Hi, The resized images (224x224) are still human readable. The attention maps on square images also appear to be giving proper weights on each character region. Other than these, there...

Demo.py

Hi, thanks for using ViTSTR. If the validation accuracy is 99.9%, something is wrong. Unfortunately, I could not find the demo script I used for ViTSTR but it is simply...

why don't you normalize the images?

Thanks. Normalization was not part of the CLOVA AI training/eval protocol that we used. So, we did not try normalization. We just reproduced their results and followed the same protocol...

why don't you normalize the images?

You might want to train fr scratch (instead of a pre-trained ViT) if you have access to a big train dataset. In such cases, you can train without resizing the...

Is there any performance comparison with clovaai/deep-text-recognition-benchmark

There are various techniques that can be used to improve the performance of transformer based models. The simplest is training on a large dataset to overcome the lack of inductive...