doctr icon indicating copy to clipboard operation
doctr copied to clipboard

Classify blocks as handwritten or printed text

Open karnsaurabhkumar opened this issue 2 years ago • 3 comments

🚀 The feature

It would be good to have a classifier internally which can distinguish blocks as handwritten or printed text. This is useful because most of the documents that I have seen has both the elements and while sometimes handwritten is important that the treatment of handwritten data is done differently than printed text data.

Motivation, pitch

I am working on extracting text from medical documents which has some parts such as hospital and doctor details as printed text and medicine name and dosage as hand written text. I would want to be able to treat them differently while running text extraction.

Alternatives

No response

Additional context

https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet

No response

karnsaurabhkumar avatar Mar 24 '22 15:03 karnsaurabhkumar

We are internally working on handwritten text recognition, it could indeed be useful to work as well on a classifier to determine the type of the text above the text recognition model.

charlesmindee avatar Apr 13 '22 11:04 charlesmindee

Did anyone make any progress on this feature?

ArsalanYounus007 avatar Jul 20 '23 16:07 ArsalanYounus007

@charlesmindee we're keenly interested in the handwritten text recognition feature as well. Has there been any progress or can you provide an estimated timeline?

ffalkenberg avatar Oct 04 '23 09:10 ffalkenberg