VISTA
VISTA copied to clipboard
The problem of VISTA
Hi, thanks for your great work!
In the paper,VISTA projects the input feature sequences X1 ∈ Rn×df and X2 ∈ Rm×df into queries Q ∈ Rn×dq and keys K ∈ Rm×dq (values V ∈Rm×dv) via convolutional operators of 3 × 3 kernels, where dq and dv are the feature dimensions of queries (keys) and values. To decouple the classification and regression tasks, Q and K are further projected into Qi, Ki, i ∈ {sem, geo} via individual MLP (implemented as 1D convolution).
However,This is not the case in the code!