Stark icon indicating copy to clipboard operation
Stark copied to clipboard

mask purpose?

Open elvindp opened this issue 2 years ago • 3 comments

Hi, in the implementation, a mask with input size is used, which is confused to me. In NLP transformer, mask is used to avoid influence of different input length zero padding, and here the mask is used to avoid the padding of the image size?

elvindp avatar Nov 26 '21 12:11 elvindp

@elvindp Hi, the mask is used to deal with padding. We don't want the network (Transformer) to learn similarity with padding regions because these regions have no valuable information. The mask operation doesn't have obvious influence on the tracking performance.

MasterBin-IIAU avatar Nov 30 '21 01:11 MasterBin-IIAU

@MasterBin-IIAU Hi, thanks you for the answer. But I have other questions:

  1. lightning model does not have score head. Is this because that the performance is bad?
  2. I find the frozen part.
  3. Without the score, if the target is out of FOV (i.e. out of image), the lightning model will always product a wrong output coordinate.
  4. Why do "sample target" on the complete image? I save the image, and find so many black padding region.

elvindp avatar Dec 06 '21 11:12 elvindp

@elvindp

  1. Without the score, if the target is out of FOV (i.e. out of image), the lightning model will always product a wrong output coordinate.

Yeah, it is annoying. This design limits the deployment of STARK.

TsingWei avatar Mar 04 '22 05:03 TsingWei