NielsRogge

Results 694 comments of NielsRogge

Hi, Could you open an issue on the Optimum library regarding this? They will be happy to help you

+1 also encountering this bug

Refer to my notebook here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb.

That's definitely possible, you just need a list of words and corresponding coordinates + labels for each document.

As said above, the only thing you need for each document page is a list of words + corresponding coordinates (bounding boxes) and labels. For FUNSD, you can see that...

I've uploaded a notebook to fine-tune OneFormer here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/OneFormer/Fine_tune_OneFormer_for_semantic_segmentation.ipynb. Hope it helps!

Hi, To perform inference with LiLT, you don't need a processor, as the model only gets text and corresponding bounding boxes as input. We only need a tokenizer. Inference can...

LiLT, like LayoutLM models, depends on an OCR engine of choice. You'll first need to run the OCR on the image to get a list of words + corresponding boxes.

Hi, The `input_ids` need to be PyTorch tensors. However in your case they are still lists of integers. You can fix this by adding `return_tensors="pt`" to the tokenizer call line:...

Your boxes need to be a tensor of shape (batch_size, seq_len, 4), and they need to be normalized by the size of the image: ``` def normalize_bbox(bbox, width, height): return...