donut icon indicating copy to clipboard operation
donut copied to clipboard

key information extraction with DonUT on hand-written documents?

Open DiTo97 opened this issue 1 year ago • 2 comments

Hi everyone,

Has anyone tried fine-tuning DonUT for key information extraction on a corpus with documents half-digital and half-handwritten? Specifically, I am wondering if anyone has any evidence on how it performs on handwritten text, given that all the suggestions on generating a synthetic dataset with SynthDoG for pre-training point to selecting appropriate fonts of the digital text.

I have a private corpus of invoices similar to CORD in nature (with slightly more variability in shape, size and format), but some of them may have sections of handwritten text from time to time in addition to or in place of digital text.

DiTo97 avatar May 09 '23 14:05 DiTo97

I can confirm that it also picks up handwritten information.

Toon-nooT avatar May 09 '23 18:05 Toon-nooT

I can confirm that it also picks up handwritten information.

Thank you @Toon-nooT,

Could you share one example document with handwritten text that you tested DonUT on?

No stress if it's not possible

DiTo97 avatar May 09 '23 18:05 DiTo97