NRTR
NRTR copied to clipboard
Questions
Hello, your code is very good, I want to ask if your project is not working yet? Second, what is the version of Tensorflow supported by this project?
same question here... and if this vesion of you project isn't working, can you kindly tell me what is missing for now?
Honestly I'm not quite sure, I followed the structure given by the paper but it doesn't seem to converge.
I probably missed something somewhere. Right now I have two things that I believe might cause the issue.
-
The labels' lengths are padded to 25 characters with null chars. This probably gives a big bias towards always predicting null chars.
-
I am not sure that I implemented attention correctly. When I looked at other implementations on GitHub it seemed coherent but the paper has some schematic where Q, K and V have different shapes which is not implemented.
I'll probably have the time to take another shot at it this week, but I can't promise much.
A bit late to the party, but my conjecture is that the masking is not working properly. When training at some point the model starts to converge fast which would indicate that it gets improper access to the expected output.
Also (and importantly) at inference time to model is fed a zero'ed array of the output's shape. After re-reading the original paper I am fairly sure that this isn't right. The output should be re-fed into the decoder in a similar way to seq2seq models.
Your code is very good. But I suppose that you should add an embedding layer and re-fed the output to the decoder.
In the paper, there's a Character Embedding layer at the bottom of the decoder, but I seem not to find it in the code. Besides the paper only mentions it as "a learned character-lever embedding". So do you have any clue about what that embedding is ?
Absolutely none, I'd be willing to add it, but I didn't find any documentation on what it was and how I was supposed to train it.
So, what’s now? Your project still not working?
That is correct, as per the first line of the README. I am sorry if this inconveniences you, but I have not had the time to work on it recently.
Now i'm implementing OCR with Transformer, i can share my result at the end.
That's great, the initial paper was never properly reproduced as far as I know so we will finally be able to check their claims.