NielsRogge
NielsRogge
Hi, Could you open an issue on the Optimum library regarding this? They will be happy to help you
+1 also encountering this bug
Refer to my notebook here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb.
That's definitely possible, you just need a list of words and corresponding coordinates + labels for each document.
As said above, the only thing you need for each document page is a list of words + corresponding coordinates (bounding boxes) and labels. For FUNSD, you can see that...
I've uploaded a notebook to fine-tune OneFormer here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/OneFormer/Fine_tune_OneFormer_for_semantic_segmentation.ipynb. Hope it helps!
Hi, To perform inference with LiLT, you don't need a processor, as the model only gets text and corresponding bounding boxes as input. We only need a tokenizer. Inference can...
LiLT, like LayoutLM models, depends on an OCR engine of choice. You'll first need to run the OCR on the image to get a list of words + corresponding boxes.
Hi, The `input_ids` need to be PyTorch tensors. However in your case they are still lists of integers. You can fix this by adding `return_tensors="pt`" to the tokenizer call line:...
Your boxes need to be a tensor of shape (batch_size, seq_len, 4), and they need to be normalized by the size of the image: ``` def normalize_bbox(bbox, width, height): return...