fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

ABOUT SRC_TOKENS

Open VTaPo opened this issue 2 years ago • 0 comments

In TransformerEncoderBase class, it's forward() function has a parameter 'src_tokens': tokens in the source language of shape (batch, src_len). It's a tensor of indexes, suppoes that: [ [10, 52, 138, ....], [53, 108, 52, ....], ............... [28, 82, 106, ....] ]

How can i get the word in the raw input text that corresponds to each index? Suppose that: [ ['I', 'want', 'to', ...], ['Today', 'I', 'have',...], ........................... ['this', 'movie', 'is', ...] ]

Thank you very much! image

VTaPo avatar Dec 04 '23 15:12 VTaPo