handwriting-generation icon indicating copy to clipboard operation
handwriting-generation copied to clipboard

Training numbers

Open impactcolor opened this issue 7 years ago • 14 comments

This is probably outside the scope of the "issues" but figure I'd ask.
I notice it doesn't take numbers. Is there away to add numbers to the xml data sets so it can also do numbers?

impactcolor avatar Oct 18 '17 19:10 impactcolor

You should be able to generate numbers like:

python generate.py --text="1 2 3 4 5 " --noinfo --bias=4.

although the quality will probably be quite bad (too little examples in dataset).

You can add your own examples in .xml format but you will have to match them to those already in dataset (content should contain tags like: <Transcription>, <Text> and <StrokeSet>, structured like in dataset).

Alternatively if you have data with consecutive points representing how to draw numbers (with labels) you could create your own dataset.

So depending on format of your dataset it might be easier or harder. :)

Grzego avatar Oct 18 '17 19:10 Grzego

I'm really new to this so I'm not sure how to go about creating a dataset. Do you have any articles or direction you can point me to?

impactcolor avatar Oct 18 '17 20:10 impactcolor

Sorry for the delay. I get the feeling you have no data, which is problematic. Could you please elaborate a little bit more on what you are trying to achieve? :)

Grzego avatar Oct 20 '17 13:10 Grzego

It's no problem, thank you for taking the time to even discuss this with me. I found a dataset which of numerically written numbers however it isn't setup as the current dataset used by IAM in xml files. What I'm trying to accomplish is to use the handwriting but it also has to include numbers and currently the numbers do not come out good.

On Fri, Oct 20, 2017 at 6:06 AM, Grzegorz Opoka [email protected] wrote:

Sorry for the delay. I get the feeling you have no data, which is problematic. Could you please elaborate a little bit more on what you are trying to achieve? :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-338200995, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOknAGNyvv2VlG7lkOJuE9BNydaJKOks5suJrygaJpZM4P-NV6 .

impactcolor avatar Oct 20 '17 19:10 impactcolor

Ok, is this dataset publicly available? I can look into it to see if there is a way to make it compatible with my code. :)

Grzego avatar Oct 21 '17 10:10 Grzego

Awesome! Here goes:

http://yann.lecun.com/exdb/mnist/

http://archive.ics.uci.edu/ml/machine-learning-databases/semeion/

I found these two

Sent from my iPhone

On Oct 21, 2017, at 3:05 AM, Grzegorz Opoka [email protected] wrote:

Ok, is this dataset publicly available? I can look into it to see if there is a way to make it compatible with my code. :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

impactcolor avatar Oct 21 '17 18:10 impactcolor

Unfortunatelly, those datasets represent numbers as images. For handwriting generation you would need to have list of consecutive points showing how a digit is written. So those datasets cannot be used here.

Grzego avatar Oct 23 '17 21:10 Grzego

Would this one work? This has the stroke data: https://github.com/edwin-de-jong/mnist-digits-stroke-sequence-data/wiki/MNIST-digits-stroke-sequence-data

On Mon, Oct 23, 2017 at 2:36 PM, Grzegorz Opoka [email protected] wrote:

Unfortunatelly, those datasets represent numbers as images. For handwriting generation you would need to have list of consecutive points showing how a digit is written. So those datasets cannot be used here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-338804235, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOkpsMBSx4SjLVJftQ-gStOB7Yv2ZYks5svQb3gaJpZM4P-NV6 .

impactcolor avatar Oct 23 '17 22:10 impactcolor

This one might work. :) Can you give some examples of sequences you want to generate? I just want to figure out what kind of augmentation to dataset might be needed.

Grzego avatar Oct 23 '17 23:10 Grzego

about 5 digit random sequences. In example 11445 8013 1507 etc..

On Mon, Oct 23, 2017 at 4:30 PM, Grzegorz Opoka [email protected] wrote:

This one might work. :) Can you give some examples of sequences you want to generate? I just want to figure out what kind of augmentation to dataset might be needed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-338826058, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOkiB0tXseZLgH7Nry79NSXJcXQchlks5svSGRgaJpZM4P-NV6 .

impactcolor avatar Oct 23 '17 23:10 impactcolor

Sorry for very late response. I tried this dataset and unfortunately it doesn't work well :/ The results are even worse than with original IAM dataset. If by any chance I find better dataset for this task I will post it here.

Grzego avatar Nov 08 '17 20:11 Grzego

THANK YOU!!!!

On Wed, Nov 8, 2017 at 12:50 PM, Grzegorz Opoka [email protected] wrote:

Sorry for very late response. I tried this dataset and unfortunately it doesn't work well :/ The results are even worse than with original IAM dataset. If by any chance I find better dataset for this task I will post it here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-342955118, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOkiSt828fSdSpFVqBdRCh93u3PkbCks5s0hQkgaJpZM4P-NV6 .

impactcolor avatar Nov 08 '17 21:11 impactcolor

Well it's been a while, but I was kind of interested in this problem and created MNIST handwriting dataset. If you still need to generate numbers you may find it useful. One simple solution is to just pick needed digits from this dataset and concatenate them together. :)

Grzego avatar Dec 08 '17 11:12 Grzego

@Grzego THANK YOU!

impactcolor avatar Dec 28 '17 01:12 impactcolor