donut
donut copied to clipboard
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Hi, This model is very powerfull, but using it in production is hard. It would be cool if we can export it to onnxruntime and do quantization so it can...
I need to do multi-class multi-label classification fine-tuning on my custom dataset. For setting up the ground truth labels, would it work well if I set `gt_parse` to {"class" :...
Hi, Thanks you for publishing this model :) I want to use this model for Document Parsing. I have annotations for two kinds of pdf, 20 images per type. At...
it is taking almost 300 sec for single question on CPU on gpu i am getting cuda out of memory below are the configuration NVIDIA GeForce RTX 2080 Ti memory...
I noticed that the way of pre-training is to use mask language modeling in your article. Is there any plan to expose this part of the code? Or is there...
I trained the model on cordv2 but I want to plot the coordinates of results also with the text output, How to show/save the output image, or just return the...
Following is the error we get when we try to pass an input size of 512\*2,512\*3: Are different input resolution/sizes are not supported currently? Traceback (most recent call last): File...
My goal is to read a specific field (say, box 30) from a nationally standardized insurance claim form. The form has 40 boxes/fields in fixed locations and each boxed is...
It could be useful to get bounding boxes coordinates from Document Information Extraction task predictions. on conventional pipeline :  on Donut it could be something like:...
Hello @SamSamhuns, @gwkrsrch, @VictorAtPL I have around 60 Images and custom 8 tokens, each image consist of 3-4 same key but different values and annotation format is like SROIE I...