parseq icon indicating copy to clipboard operation
parseq copied to clipboard

Could you please provide a detailed explanation of PARseq architecture

Open shanmugamani1023 opened this issue 1 year ago • 4 comments

As I have reviewed your documentation, I am unable to understand how the prediction is made. Could you explain the entire procedure step by step?

shanmugamani1023 avatar Nov 23 '23 10:11 shanmugamani1023

Kind refer to our ECCV presentation: https://drive.google.com/file/d/11VoZW4QC5tbMwVIjKB44447uTiuCJAAD/view

It contains a brief step-by-step walkthrough of the decoding process.

baudm avatar Nov 24 '23 11:11 baudm

Thank you, I understand that you are using ensembles of autoregressive models (AR) in order to detect real-time scene text using a video. However, I am confused about how the image will be encoded and decoded and how it will be trained?

In order to understand that process, how does the input image get split up into features, and how does the process flow proceed? I want to know process flow of attached image system

shanmugamani1023 avatar Nov 25 '23 06:11 shanmugamani1023

Thank you for your presentation video. Now I have a better understanding of the architecture.

Let me explain, correct me if I am mistaken,

By using encoders and decoders, we will decode the text from the image. By using input text, the output text will be compared to the input text, and the model will be trained.

The output will be generated if I provide new text. Am I correct?

shanmugamani1023 avatar Nov 30 '23 06:11 shanmugamani1023

There is a run-time error in your hugging face space

shanmugamani1023 avatar Nov 30 '23 06:11 shanmugamani1023