donut
donut copied to clipboard
key information extraction with DonUT on hand-written documents?
Hi everyone,
Has anyone tried fine-tuning DonUT for key information extraction on a corpus with documents half-digital and half-handwritten? Specifically, I am wondering if anyone has any evidence on how it performs on handwritten text, given that all the suggestions on generating a synthetic dataset with SynthDoG for pre-training point to selecting appropriate fonts of the digital text.
I have a private corpus of invoices similar to CORD in nature (with slightly more variability in shape, size and format), but some of them may have sections of handwritten text from time to time in addition to or in place of digital text.
I can confirm that it also picks up handwritten information.
I can confirm that it also picks up handwritten information.
Thank you @Toon-nooT,
Could you share one example document with handwritten text that you tested DonUT on?
No stress if it's not possible