NielsRogge comments

Results 694 comments of


                                            NielsRogge

Inference with nielsr/lilt-xlm-roberta-base

You just gotta make sure to normalize your bounding boxes as the model only knows embeddings for boxes between 0 and 1000. See here: https://huggingface.co/docs/transformers/en/model_doc/layoutlm#usage-tips (it's equivalent for LiLT)

segment-level OCR for LiLT

I'd recommend using AzureReadAPI with setting readingOrder="natural".

How to train Mask2Former from a COCO json custom dataset?

I'd recommend taking a look here: https://github.com/facebookresearch/detr/blob/3af9fa878e73b6894ce3596450a8d9b89d918ca9/datasets/coco.py#L74-L76. The data preparation is equivalent for MaskFormer/Mask2Former/OneFormer. Basically, COCO stores segmentation masks as polygons, so you need to convert them to a set...

How to train Mask2Former from a COCO json custom dataset?

> The solution seems for me to write a custom dataset converter to convert my polygon annotations to the custom RGB format (R channel for classID, G channel for instance...

How to train Mask2Former from a COCO json custom dataset?

@Robotatron it does support it, however the image processor (which can be used to speed up data preparation) doesn't. So I'd advise to prepare the data yourself for the model,...

How to train Mask2Former from a COCO json custom dataset?

@cyh-0 MaskFormer outputs a binary mask + class for each of its object queries (`model.config.num_queries`). If an image contains 2 semantic categories for instance, and the model uses 100 object...

FineTuning OpenAI's CLIP model

Hi, Thanks for the kind words :) CLIP cannot really be used for image captioning out-of-the-box, as it only consists of 2 encoders (a vision and a text encoder). There...

FineTuning OpenAI's CLIP model

Hi, We do provide a script to fine-tune CLIP and similar models on an (image, text) dataset here: https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text. Alternatively have a look at the OpenCLIP repository which also provides...

"olemeyer/docvqa-en-de-fr-es-it" is not available in huggingface anymore

Hi yes, here's a guide: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/DocVQA/Creating_a_toy_DocVQA_dataset_for_Donut.ipynb

"olemeyer/docvqa-en-de-fr-es-it" is not available in huggingface anymore

Creating a HF Dataset from scratch is explained here: https://huggingface.co/docs/datasets/image_dataset.