parseq Could you please provide a detailed explanation of PARseq architecture

Could you please provide a detailed explanation of PARseq architecture

Open shanmugamani1023 opened this issue 1 year ago • 4 comments

As I have reviewed your documentation, I am unable to understand how the prediction is made. Could you explain the entire procedure step by step?

Nov 23 '23 10:11 shanmugamani1023

Kind refer to our ECCV presentation: https://drive.google.com/file/d/11VoZW4QC5tbMwVIjKB44447uTiuCJAAD/view

It contains a brief step-by-step walkthrough of the decoding process.

Nov 24 '23 11:11 baudm

Thank you, I understand that you are using ensembles of autoregressive models (AR) in order to detect real-time scene text using a video. However, I am confused about how the image will be encoded and decoded and how it will be trained?

In order to understand that process, how does the input image get split up into features, and how does the process flow proceed? I want to know process flow of attached image system

Nov 25 '23 06:11 shanmugamani1023

Thank you for your presentation video. Now I have a better understanding of the architecture.

Let me explain, correct me if I am mistaken,

By using encoders and decoders, we will decode the text from the image. By using input text, the output text will be compared to the input text, and the model will be trained.

The output will be generated if I provide new text. Am I correct?

Nov 30 '23 06:11 shanmugamani1023

There is a run-time error in your hugging face space

Nov 30 '23 06:11 shanmugamani1023

parseq parseq copied to clipboard

Could you please provide a detailed explanation of PARseq architecture

parseq
parseq copied to clipboard