Transformers-Tutorials Recreating DocVQA results for LayoutLMv2

Related issue on the unilm repo.

I'm trying to recreate the results reported in the LayoutLMv2 paper, Table 6, row 7. Following this example, I've fine-tuned the base model with DocVQA training set for 20 epochs. The resulting model is under-performing compared to what's reported in the paper (roughly 40% of answers default to [CLS]). I'm wondering whether:

anyone has been able to reproduce the results
the number of epochs (20) was based on original work by authors or was for demo purposes only

Nov 01 '21 21:11 ArmiNouri

Hi!

The number of epochs was set arbitrary, for demo purposes only.

Apparently, the Microsoft authors used a couple of tricks (which they didn't share) in order to come up with the results on DocVQA as reported in the paper.

I personally also wonder how they managed to get such a high score, as LayoutLMv2 requires an external OCR engine, which would work quite badly on handwritten documents. However, with new models such as TrOCR, this might become easier.

Nov 02 '21 14:11 NielsRogge

Thank you for the quick reply. I've followed up with the authors and will share if I find anything out. Your notebooks really helped and are a great resource. Thank you.

Nov 02 '21 17:11 ArmiNouri

while implementing layoutLMv2 for DocVQA I am not able to use LayoutLMv2FeatureExtractor and create Dataset_with_ocr. I am getting the following error: ArrowNotImplementedError: Unsupported cast from list<item: list<item: list<item: uint8>>> to utf8 using function cast_string not able to understand it or why is it happening. please help

Nov 22 '21 18:11 anupamadeo

@anupamadeo if I recall correctly I had the same issue. It was because the mapping function was trying to recast the image column to a new type. What helped me was using a temporary new column (images) and casting it back to image at the end of the process.

feature_extractor = LayoutLMv2FeatureExtractor()

def get_ocr_words_and_boxes(examples):
    # get a batch of document images
    images = [Image.open(root_dir + image_file).convert("RGB") for image_file in examples['image']]
    # resize every image to 224x224 + apply tesseract to get words + normalized boxes
    encoded_inputs = feature_extractor(images)
    examples['images'] = encoded_inputs.pixel_values
    examples['words'] = encoded_inputs.words
    examples['boxes'] = encoded_inputs.boxes
    return examples

dataset_with_ocr = dataset.map(lambda x: get_ocr_words_and_boxes(x), batched=True, batch_size=10)
dataset_with_ocr = dataset_with_ocr.map(lambda example: {'image': example['images']}, remove_columns=['images'])

Nov 22 '21 19:11 ArmiNouri

Thanks for such a quick reply. It solved my problem.

Nov 22 '21 19:11 anupamadeo

Hi, Is there any way to train the tokenizer in LatoutLMv2 for domain specific vocabulary?

Nov 24 '21 05:11 anupamadeo

Has anyone reached the scores reported by Microsoft/layoutlmv2 on docvqa? I was able to train the model on train data. But my NLS scores are quite low on val data from docvqa. I was only able to get around 40. @NielsRogge @tiennvcs @ArmiNouri

Nov 29 '21 12:11 sujit420

I am trying LayoutLMv2 on a new dataset for question answering using the same code given in the notebook. I wanted to train and test it but not able to create batches for test. I am also new to pytorch , I have worked with tensorflow. Kindly help

Dec 23 '21 17:12 anupamadeo

not able to create batches f

you have to post your error for any help

Dec 24 '21 05:12 sujit420

I submit my result on DocVQA website today,but it dosen‘t has the score,has anyone know the reason?

Jan 05 '22 03:01 dongxuewang-123

Would anyone mind reporting your best ANLS scores using Tesseract?

@dongxuewang-123 their server had a bug which should be fixed now

Mar 19 '22 10:03 herobd

Transformers-Tutorials Transformers-Tutorials copied to clipboard

Recreating DocVQA results for LayoutLMv2

Transformers-Tutorials
Transformers-Tutorials copied to clipboard