donut icon indicating copy to clipboard operation
donut copied to clipboard

how to finetune a model with a downsteam task that is same with the pre-train task?

Open SleepEarlyLiveLong opened this issue 2 years ago • 4 comments

I want to finetune a model based on "naver-clova-ix/donut-base" with a downstream task that is different from those three tasks mentioned in the paper (Document Classification, Document Information Extraction, and Document Visual Question Answering), but same as the Pre-training task, which is to say, I want to teach the model to learn "how to read" better. In that task, I will input an image to the donut-base, expecting the model to output all the text in the image and calculate the loss against the pre-prepared correct text, and then use the loss for backpropagation. The question is, how should I modify the released code or configuration file? Thank you!

SleepEarlyLiveLong avatar Feb 21 '23 06:02 SleepEarlyLiveLong

I also have the same need as you, have you solved it now?

willpat1213 avatar Mar 13 '23 10:03 willpat1213

@moonbings @SamSamhuns @eltociear , any of you folks have a solution for it? I think it would be helpful to include the train_pretrain.yaml in the config file folder!

ChrisDelClea avatar Mar 28 '23 20:03 ChrisDelClea

isn't the same with finetuning on information extraction considering ground truth label all the tokens on the document you want your model learn to read ?

AmT42 avatar May 24 '23 20:05 AmT42

it is just pseudo reading task as describe in the doc:

For (Pseudo) Text Reading Task The gt_parse looks like {"text_sequence" : "word1 word2 word3 ... "}

bugface avatar Nov 15 '23 20:11 bugface