Implement a tool to calculate a BPE vocabulary
A tool should be included in lttoolbox which calculates a BPE vocabulary as defined in this paper: https://arxiv.org/pdf/1508.07909.pdf
The idea is to use BPE to weight our morphological transducers.
Since the algorithm is already defined in the paper, this would be a matter of using the same for Ittoolbox, correct? Or are there some additional factors you would need in the implementation?
@anjalibhavan well, the first version would be just the algorithm as described in the paper. Later the tool would support weighting lttoolbox transducers according to the vocabulary of the tool.
The code is implemented in Python by https://github.com/rsennrich/subword-nmt