revtok icon indicating copy to clipboard operation
revtok copied to clipboard

SubwordSegmenter fails when max_size is None

Open mttk opened this issue 6 years ago • 1 comments

Hey,

when calling this module from torchtext, the default max_sizeis None, which gets propagated to SubwordSegmenter and causes a not-so obvious error (in the tqdm loop, or even more obfuscated when using Julia).

Could max_size be set to the number of unique ngrams if it is None?

mttk avatar May 18 '18 08:05 mttk

Is this fixed?

GeoffNN avatar Oct 12 '18 20:10 GeoffNN