unilm icon indicating copy to clipboard operation
unilm copied to clipboard

[markuplm] Unable to use with Huggingface

Open louis030195 opened this issue 2 years ago • 6 comments

Describe the bug Model: markuplm

Screenshot 2021-11-27 at 09 38 39

The problem arises when using:

  • [x] the official example scripts: (give details below)

A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

pip install transformers
# Or main source code "git clone https://github.com/huggingface/transformers && cd transformers && pip install ."
from transformers import AutoTokenizer, MarkupLMForPretraining
  
tokenizer = AutoTokenizer.from_pretrained("microsoft/markuplm-large")

model = MarkupLMForPretraining.from_pretrained("microsoft/markuplm-large")

ValueError: Tokenizer class MarkupLMTokenizer does not exist or is not currently imported.

Expected behavior A clear and concise description of what you expected to happen. The tokenizer and model are properly loaded.

  • Platform: Google Colab

  • Python version: Screenshot 2021-11-27 at 09 43 45

  • PyTorch version (GPU?): Screenshot 2021-11-27 at 09 44 15

louis030195 avatar Nov 27 '21 08:11 louis030195

Now MarkupLM is not supported by the package transformers of huggingface, so you can only use it by downloading our source code. We will work on it to make MarkupLM appear on transformers soon.

lockon-n avatar Nov 27 '21 09:11 lockon-n

Hi,

I've added MarkupLM to Transformers here: https://github.com/NielsRogge/transformers/tree/modeling_markuplm/src/transformers/models/markuplm

However, I've not opened a PR yet, as I'd like to have a MarkupLProcessor (similar to LayoutLMv2Processor), that allows to prepare all data for the model (rather than only tokenizing text).

Feel free to work further on my branch.

NielsRogge avatar Nov 29 '21 13:11 NielsRogge

@NielsRogge Thanks for adding MakupLM into the great transformers library! We have add a processor for MarkupLM like LayoutLMv2Processor as you required, and opened a PR under your branch. However this implementation is not so complete as we are not familiar with all the apis in transformers. We would appreciate it very much if you can kindly help us improve and officially release it.

lockon-n avatar Dec 27 '21 03:12 lockon-n

@NielsRogge Any updates for adding MarkupLM to Transformers?

wolfshow avatar Mar 08 '22 03:03 wolfshow

@NielsRogge you are amazing. Thank you for this!

iamnafets avatar Feb 24 '23 23:02 iamnafets

MarkupLM is now part of the Transformers library, feel free to close this issue :)

  • Docs: https://huggingface.co/docs/transformers/model_doc/markuplm

  • Demo notebooks: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/MarkupLM

NielsRogge avatar Feb 25 '23 08:02 NielsRogge