yoyodyne icon indicating copy to clipboard operation
yoyodyne copied to clipboard

Concatenated features break symbol decoding

Open Adamits opened this issue 1 year ago • 5 comments

For models where the features are concatenated to the source string, we now handle this in the collator. We simply add the source_token vocabulary length to each feature index in order to avoid clashes: https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/data/collators.py#L71

However, the symbol maps do not track this. Thus, if we want to decode a source (e.g. as a sanity check in the logs), this will not work since the feature indices are out of raneg---they no longer can be meaningfully mapped back to their surface form. Not a critical bug, but definitely an odd behavior.

Adamits avatar Nov 02 '23 20:11 Adamits