GKT
GKT copied to clipboard
Questions about the composition of memory?
Hi, thank you very much for your article. I have doubts about some of the details in the article. GKT uses the BEV(HWD) reference point to obtain the vision token from the image to form a VSCKK feature, then the Decoder's memory is HWD VSC Is the token composed of the characteristics of KK?
您好,非常感谢您的文章。我对于文章中一些细节存在疑惑。 GKT利用BEV(HWD)参照点,去图像中获取vision token,组成一个VSCKK的特征,那么Decoder的memory是HWD 个VSCKK的特征组成的token吗