yoyodyne
yoyodyne copied to clipboard
Concatenated features break symbol decoding
For models where the features are concatenated to the source string, we now handle this in the collator. We simply add the source_token vocabulary length to each feature index in order to avoid clashes: https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/data/collators.py#L71
However, the symbol maps do not track this. Thus, if we want to decode a source (e.g. as a sanity check in the logs), this will not work since the feature indices are out of raneg---they no longer can be meaningfully mapped back to their surface form. Not a critical bug, but definitely an odd behavior.