muzic
muzic copied to clipboard
【meloform】why all token id in dictionary is 0?
Notice that gen_dictionary function uses the variable ‘num’ to represent each token's id, but 'num' keeps 0 for the whole process, so all
tokens in the dictionary are 0, is that a bug, or does it make any sense?
Actually, it is not the token ids. This dictionary is created especially for using fairseq framework. You can refer to https://github.com/facebookresearch/fairseq/blob/f131336fc303992cf309be3953bf523e1654fa1f/fairseq/data/dictionary.py#L125 for how it loads the dictionary, especially the add_symbol() function. The variable "num" is just the initial count of the tokens.