unilm # epochs for finetuning LayoutLMv2 on DocVQA

# epochs for finetuning LayoutLMv2 on DocVQA

Open ArmiNouri opened this issue 3 years ago • 4 comments

I'm trying to recreate the results reported in the LayoutLMv2 paper, Table 6, row 7. Following this example, I've fine-tuned the base model with DocVQA train set for 20 epochs. The resulting model is under-performing compared to what's reported in the paper (roughly 40% of answers default to [CLS]). As I continue to debug the code I'm wondering how many epochs were used to fine-tune the model in the original work.

Nov 01 '21 21:11 ArmiNouri

@ArmiNouri Which OCR tool do you use for the DocVQA?

Nov 03 '21 03:11 wolfshow

I used tesseract following the collab notebook (which I realize is not the same as what you've used). If I switch to MS Read am I expected to get the same result? Alternatively if there is a pre-trained version of LayoutLMv2 that has been finetuned on DocVQA, could you make it available?

Nov 03 '21 10:11 ArmiNouri

I have same issue and I used Tesseract and DocTR. Appreciate if you publish pretrained models like Layoutlm model.

Nov 23 '21 16:11 antecessor

In case this helps someone, I adapted the mentioned collab into a training script for the full dataset. It can train either on Tesseract OCR or the dataset OCR. It was explicitly made to evaluate LayoutLMv2 on DocVQA using Tesseract to demonstrate poor performance. It's definitely not optimized, but should be fairly good.

https://github.com/herobd/layoutlmv2

Mar 28 '22 22:03 herobd

unilm unilm copied to clipboard

# epochs for finetuning LayoutLMv2 on DocVQA

unilm
unilm copied to clipboard