Yupan Huang
Yupan Huang
You may refer to LayoutLMv3's paper and BEiT's code for image masking (the Masked Image Modeling objective).
Thank you for the question and detailed description. There is no necessary relationship between evaluation metrics and evaluation loss (see [some explanations](https://datascience.stackexchange.com/questions/42599/what-is-the-relationship-between-the-accuracy-and-the-loss-in-deep-learning)). For example, when [fine-tuning FUNSD](https://huggingface.co/HYPJUDY/layoutlmv3-base-finetuned-funsd/tensorboard), the evaluation loss...
I think the performance would be worse using the less accurate OCR (the one provided). I do not have a quantitative comparison because I did not experiment with this.
Thank you for your interest! According to the error message, something seemed wrong with loading the CORD dataset. Could you check if the dataset has been downloaded successfully? The code...
It is hard to locate the cause of errors and debug without error stack traces. Have you set `IMS_PER_BATCH` to a multiple of your GPU size? For example, if you...
We have some explanations in the paper's Section 3.4: > To demonstrate the generality of LayoutLMv3 from the multimodal domain to the visual domain, we transfer LayoutLMv3 to a document...
The bbox information is a necessary input for layoutlm models. But you can skip the manual input of bbox and extract it automatically with an OCR processor. @NielsRogge provides two...
Many spaces use the LayoutLMv3 models. For example, you are welcome to explore more on [Spaces using microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base). I am closing this issue for now, since it is inactive.
Have you solved the problem? It looks like a network issue. Can you reach the address on your server (e.g., through `wget https://raw.githubusercontent.com/huggingface/datasets/2.4.0/metrics/seqeval/seqeval.py`)?
I am closing this issue for now, since it is inactive.