PVT Use PVTv2 as backbone for custom model

Hello,

I want to apply PVTv2 as backbone for my human pose estimation model, which is not based on mmcv. How should I use it correctly?

Thanks in advance!

Aug 20 '21 08:08 EckoTan0804

Hi,

I have a same question as comment above. I want to use PVTv2 as the backbone for tracking model. I tried to connect tensors of four stages and apply zero padding to an unique tensor, but I think that is not a right way. How should I use your model correctly?

Thank you.

Aug 21 '21 17:08 hoangtv2000

@hoangtv2000 Hello, did you divide PVT2 into 4 stages or using an iteration method? I have some confusion

Aug 23 '21 04:08 khawar-islam

@hoangtv2000 Hello, did you divide PVT2 into 4 stages or using an iteration method? I have some confusion

I just concatenate 4 tensors by shape[1] and apply zero padding shape[2] and shape[3] to tensors which have smaller shape (shape[2] and shape[3]).

But I think that is not a right way, the concatenated feature map is four-times bigger than my modified ResNet. Specifically the concatenated feature map: [1, 1024, 64, 64] And the ResNet feature map: [1, 1024, 32, 32]

Aug 23 '21 05:08 hoangtv2000

Right. I am dividing the four stages separately for better understanding. May you please help me to make a separate four stages? The below implementation is only for PVT but I need for PVT V2 https://github.com/ofsoundof/LocalViT/blob/main/models/pvt.py

Aug 23 '21 05:08 khawar-islam

Right. I am dividing the four stages separately for better understanding. May you please help me to make a separate four stages? The below implementation is only for PVT but I need for PVT V2 https://github.com/ofsoundof/LocalViT/blob/main/models/pvt.py

Hi,

The output of https://github.com/whai362/PVT/blob/16eabba29aca820e785a8def1ec73bb805c2daec/detection/pvt_v2.py#L308 is a list containing feature maps output at each stage. In other words, there're 4 feature maps in total, as mentioned in the PVTv1 paper (see Figure 3).

PS: PVTv2 applies 3 improvements on the base of PVTv1. The network structure is almost the same.

Aug 23 '21 08:08 EckoTan0804