donut issues

[Feature] Expporting Donut to onnx and quantization

5

Hi, This model is very powerfull, but using it in production is hard. It would be cool if we can export it to onnxruntime and do quantization so it can...

WaterKnight1998

multi-class multi-label classification fine-tuning

I need to do multi-class multi-label classification fine-tuning on my custom dataset. For setting up the ground truth labels, would it work well if I set `gt_parse` to {"class" :...

jackkwok

Working very well on training but doing pretty poorly at inference

12

Hi, Thanks you for publishing this model :) I want to use this model for Document Parsing. I have annotations for two kinds of pdf, 20 images per type. At...

WaterKnight1998

inference speed is very slow on cpu for docvqa task and CUDA out of memory

it is taking almost 300 sec for single question on CPU on gpu i am getting cuda out of memory below are the configuration NVIDIA GeForce RTX 2080 Ti memory...

roburst2

How to do pre-training?

I noticed that the way of pre-training is to use mask language modeling in your article. Is there any plan to expose this part of the code? Or is there...

daeing

How to see the ploted bboxes on image as result along with json output

1

I trained the model on cordv2 but I want to plot the coordinates of results also with the text output, How to show/save the output image, or just return the...

deepanshudashora

Different input resolution throws error

1

Following is the error we get when we try to pass an input size of 512\*2,512\*3: Are different input resolution/sizes are not supported currently? Traceback (most recent call last): File...

Souvic

Question on fine-tuning document form parsing labeling requirement

My goal is to read a specific field (say, box 30) from a nationally standardized insurance claim form. The form has 40 boxes/fields in fixed locations and each boxed is...

jackkwok

Add bounding boxes coordinates in predictions

27

It could be useful to get bounding boxes coordinates from Document Information Extraction task predictions. on conventional pipeline : ![Screenshot from 2022-09-05 06-33-35](https://user-images.githubusercontent.com/71890227/188361130-959b1c7b-eaf4-45be-a1f9-ec6bb0bcf0b9.png) on Donut it could be something like:...

underthesand

How many minimum images required for training

2

Hello @SamSamhuns, @gwkrsrch, @VictorAtPL I have around 60 Images and custom 8 tokens, each image consist of 3-4 same key but different values and annotation format is like SROIE I...

qustions

donut
donut copied to clipboard

Metadata

[Feature] Expporting Donut to onnx and quantization

multi-class multi-label classification fine-tuning

Working very well on training but doing pretty poorly at inference

inference speed is very slow on cpu for docvqa task and CUDA out of memory

How to do pre-training?

How to see the ploted bboxes on image as result along with json output

Different input resolution throws error

Question on fine-tuning document form parsing labeling requirement

Add bounding boxes coordinates in predictions

How many minimum images required for training

← Metadata

Owner

Metadata

donut donut copied to clipboard

Metadata

← Metadata

Owner

Metadata

donut
donut copied to clipboard