revtok
revtok copied to clipboard
SubwordSegmenter fails when max_size is None
Hey,
when calling this module from torchtext, the default max_size
is None, which gets propagated to SubwordSegmenter and causes a not-so obvious error (in the tqdm loop, or even more obfuscated when using Julia).
Could max_size be set to the number of unique ngrams if it is None?
Is this fixed?