tensor2tensor
tensor2tensor copied to clipboard
Quote and single quote are not handled correctly in vocab file where words are not wrapped in quotes
Especially following branch will remove the quote so that it becomes empty string (as single quote character starts and ends with quote):
https://github.com/tensorflow/tensor2tensor/blob/5f9dd2db6d7797162e53adf152310ed13e9fc711/tensor2tensor/data_generators/text_encoder.py#L929
easy fix is the check also that "len(s) > 1" in both conditions