NielsRogge

Results 388 comments of NielsRogge

Ok, thanks for the detailed answer! Each model in HuggingFace Transformers requires 3 files to be implemented: * `configuration_segformer.py`, which defines the hyperparameters * `modeling_segformer.py`, which implements the model *...

Ok great, thanks for the response. I've just finished the conversion script (which let's me convert the original checkpoints to their HuggingFace counterpart). Currently, it only complains about the following...

Ok thanks! I'm currently testing my implementation and the original one on the same image (one from ADE20k). However, when comparing the pixel values prepared by `SegFormerFeatureExtractor` to the ones...

Ok, thanks for the information. Yeah in HuggingFace Transformers, all feature extractors (`ViTFeatureExtractor`, `DeiTFeatureExtractor`, `DetrFeatureExtractor`) currently rely on PIL, and they are not meant to be fully-fledged preprocessors, for now...

Ok great, thanks for looking into it! Inference now works: https://colab.research.google.com/drive/1Aq2uelaRNubW1iduc2oh0kkUIYamgZkY?usp=sharing I've uploaded weights of the b0 model to the hub as can be seen [here](https://huggingface.co/nielsr/segformer-b0-finetuned-ade-512-512). If the project is...

So the labels are set to -100 for pad pixels? Can you point me to where this happens in the code?

Another question: when calculating the loss, the logits need to be upsampled again as shown here: https://github.com/NVlabs/SegFormer/blob/93301b33d7b7634b018386681be3a640f5979957/mmseg/models/decode_heads/decode_head.py#L220-L224 Why are we taking `seg_label.shape[2:]`? If I understand correctly, the input to the...

Yes I understand that. But why not shape[1:] instead of shape[2:]? The seq_label has shape (batch_size, height, width) right, or not?

Hi, I'm also defining a `SegFormerForImageClassification`, as you can also use the SegFormer encoder to classify images. I see [here](https://github.com/NVlabs/SegFormer/blob/93301b33d7b7634b018386681be3a640f5979957/mmseg/models/backbones/mix_transformer.py#L257) that the classification head projects from the hidden size of...

> By the way, our PVTv2 is also a very strong vision transformer backbone, does HuggingFace consider supporting it? If you can support SegFormerForImageClassification, it is super easy to support...