Stark mask purpose？

mask purpose？

Open elvindp opened this issue 2 years ago • 3 comments

Hi, in the implementation, a mask with input size is used, which is confused to me. In NLP transformer, mask is used to avoid influence of different input length zero padding, and here the mask is used to avoid the padding of the image size?

Nov 26 '21 12:11 elvindp

@elvindp Hi, the mask is used to deal with padding. We don't want the network (Transformer) to learn similarity with padding regions because these regions have no valuable information. The mask operation doesn't have obvious influence on the tracking performance.

Nov 30 '21 01:11 MasterBin-IIAU

@MasterBin-IIAU Hi, thanks you for the answer. But I have other questions:

lightning model does not have score head. Is this because that the performance is bad?
I find the frozen part.
Without the score, if the target is out of FOV (i.e. out of image), the lightning model will always product a wrong output coordinate.
Why do "sample target" on the complete image? I save the image, and find so many black padding region.

Dec 06 '21 11:12 elvindp

@elvindp

Without the score, if the target is out of FOV (i.e. out of image), the lightning model will always product a wrong output coordinate.

Yeah, it is annoying. This design limits the deployment of STARK.

Mar 04 '22 05:03 TsingWei

Stark Stark copied to clipboard

mask purpose？

Stark
Stark copied to clipboard