hku comments

Repositories
Issues
Comments

Results 1 comments of

hku

spatial temporal embedding 的定义方式

> 更新了下新的实现方式，代码参考来源[Latte](https://github.com/Vchitect/Latte/blob/main/models/latte.py#L270)。本质是2d的xy与时间embedding分开。我觉得你这样也没问题，模型其实很快会学到2+1d的映射和3d的位置编码是差不多的。嗯，我倾向于 spatial_temporal_embed = np.concatenate([emb_h, emb_w, emb_t], axis=1) 一方面是认为patches描述视频，时间空间维度是等价的，这样更简单直观；另一方面这种方式除了体现了“绝对位置信息”，也保留了分别在xyt上的“相对位置信息”。