Martin Tutek

Results 5 issues of Martin Tutek

Fixes #23 Arxiv changed their layout so I updated the line indices / starting points accordingly. Gave them names so further updates might be easier to update & debug. I...

Delegate the `unk_token` to arguments when constructing the vocabulary. Fixes #618 , relatively major issue.

Closes #273 Draft, function calls and names subject to change, but the gist is here.

At some point, we could implement this (and the creation of `data`) using views. Now is not the time though. Some performance metrics would be interesting to compare between this...

Hey, when calling this module from torchtext, the default `max_size`is None, which gets propagated to SubwordSegmenter and causes a not-so obvious error (in the tqdm loop, or even more obfuscated...