handwriting-ocr
handwriting-ocr copied to clipboard
Problem when training my own dataset on Seq2seq
Hi Breta,
First of all thank you for your amazing work, i'm learning a lot from it !
Here is my problem. I am trying to train my own dataset (made of words) on the Seq2Seq model. However my dataset is composed of french words with accentuated characters such as 'é' or 'è'.
How do i extend the alphabet and train the model with this new characters ?
Here is what i tried. I added the new characters to the pre existing alphabet in the ocr.datahelpers. Then in the Seq2seq notebook i uploaded my images with the labels.
When i tuned the parameters, i changed char_size to 98 which is the amount of characters i use. I didn't touch any other parameter.
And then i have this error when i run the last cell :
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
ValueError: could not broadcast input array from shape (148) into shape (120)`
I noticed the number (148) changes from time to time ( (106), (108), (132), (268), (90), (70),...)
Do you have an idea about where the problem lies and how i could deal with it please ?