NRTR icon indicating copy to clipboard operation
NRTR copied to clipboard

Questions

Open aacufx2 opened this issue 6 years ago • 10 comments

Hello, your code is very good, I want to ask if your project is not working yet? Second, what is the version of Tensorflow supported by this project?

aacufx2 avatar Aug 13 '18 07:08 aacufx2

same question here... and if this vesion of you project isn't working, can you kindly tell me what is missing for now?

Cancerce1l avatar Aug 13 '18 08:08 Cancerce1l

Honestly I'm not quite sure, I followed the structure given by the paper but it doesn't seem to converge.

I probably missed something somewhere. Right now I have two things that I believe might cause the issue.

  1. The labels' lengths are padded to 25 characters with null chars. This probably gives a big bias towards always predicting null chars.

  2. I am not sure that I implemented attention correctly. When I looked at other implementations on GitHub it seemed coherent but the paper has some schematic where Q, K and V have different shapes which is not implemented.

I'll probably have the time to take another shot at it this week, but I can't promise much.

Belval avatar Aug 13 '18 10:08 Belval

A bit late to the party, but my conjecture is that the masking is not working properly. When training at some point the model starts to converge fast which would indicate that it gets improper access to the expected output.

Also (and importantly) at inference time to model is fed a zero'ed array of the output's shape. After re-reading the original paper I am fairly sure that this isn't right. The output should be re-fed into the decoder in a similar way to seq2seq models.

Belval avatar Oct 31 '18 00:10 Belval

Your code is very good. But I suppose that you should add an embedding layer and re-fed the output to the decoder.

15307130116 avatar Mar 07 '19 09:03 15307130116

In the paper, there's a Character Embedding layer at the bottom of the decoder, but I seem not to find it in the code. Besides the paper only mentions it as "a learned character-lever embedding". So do you have any clue about what that embedding is ?

fan003322 avatar Jul 23 '19 12:07 fan003322

Absolutely none, I'd be willing to add it, but I didn't find any documentation on what it was and how I was supposed to train it.

Belval avatar Jul 23 '19 13:07 Belval

So, what’s now? Your project still not working?

ghost avatar Mar 15 '20 16:03 ghost

That is correct, as per the first line of the README. I am sorry if this inconveniences you, but I have not had the time to work on it recently.

Belval avatar Mar 15 '20 19:03 Belval

Now i'm implementing OCR with Transformer, i can share my result at the end.

ghost avatar Mar 18 '20 16:03 ghost

That's great, the initial paper was never properly reproduced as far as I know so we will finally be able to check their claims.

Belval avatar Mar 18 '20 16:03 Belval