NielsRogge comments

Results 388 comments of


NielsRogge

Porting SegFormer to HuggingFace Transformers

Ok, thanks for the detailed answer! Each model in HuggingFace Transformers requires 3 files to be implemented: * `configuration_segformer.py`, which defines the hyperparameters * `modeling_segformer.py`, which implements the model *...

Porting SegFormer to HuggingFace Transformers

Ok great, thanks for the response. I've just finished the conversion script (which let's me convert the original checkpoints to their HuggingFace counterpart). Currently, it only complains about the following...

Porting SegFormer to HuggingFace Transformers

Ok thanks! I'm currently testing my implementation and the original one on the same image (one from ADE20k). However, when comparing the pixel values prepared by `SegFormerFeatureExtractor` to the ones...

Porting SegFormer to HuggingFace Transformers

Ok, thanks for the information. Yeah in HuggingFace Transformers, all feature extractors (`ViTFeatureExtractor`, `DeiTFeatureExtractor`, `DetrFeatureExtractor`) currently rely on PIL, and they are not meant to be fully-fledged preprocessors, for now...

Porting SegFormer to HuggingFace Transformers

Ok great, thanks for looking into it! Inference now works: https://colab.research.google.com/drive/1Aq2uelaRNubW1iduc2oh0kkUIYamgZkY?usp=sharing I've uploaded weights of the b0 model to the hub as can be seen [here](https://huggingface.co/nielsr/segformer-b0-finetuned-ade-512-512). If the project is...

Porting SegFormer to HuggingFace Transformers

So the labels are set to -100 for pad pixels? Can you point me to where this happens in the code?

Porting SegFormer to HuggingFace Transformers

Another question: when calculating the loss, the logits need to be upsampled again as shown here: https://github.com/NVlabs/SegFormer/blob/93301b33d7b7634b018386681be3a640f5979957/mmseg/models/decode_heads/decode_head.py#L220-L224 Why are we taking `seg_label.shape[2:]`? If I understand correctly, the input to the...

Porting SegFormer to HuggingFace Transformers

Yes I understand that. But why not shape[1:] instead of shape[2:]? The seq_label has shape (batch_size, height, width) right, or not?

Porting SegFormer to HuggingFace Transformers

Hi, I'm also defining a `SegFormerForImageClassification`, as you can also use the SegFormer encoder to classify images. I see [here](https://github.com/NVlabs/SegFormer/blob/93301b33d7b7634b018386681be3a640f5979957/mmseg/models/backbones/mix_transformer.py#L257) that the classification head projects from the hidden size of...

Porting SegFormer to HuggingFace Transformers

> By the way, our PVTv2 is also a very strong vision transformer backbone, does HuggingFace consider supporting it? If you can support SegFormerForImageClassification, it is super easy to support...