act-plus-plus
act-plus-plus copied to clipboard
why use the output of the first decoder layers in ACT model?
hs = self.transformer(src, None, self.query_embed.weight, pos, latent_input, proprio_input, self.additional_pos_embed.weight)[0]
, In the ACT model, should this index be -1?