lttoolbox icon indicating copy to clipboard operation
lttoolbox copied to clipboard

Implement a tool to calculate a BPE vocabulary

Open ftyers opened this issue 6 years ago • 3 comments

A tool should be included in lttoolbox which calculates a BPE vocabulary as defined in this paper: https://arxiv.org/pdf/1508.07909.pdf

The idea is to use BPE to weight our morphological transducers.

ftyers avatar Feb 23 '20 01:02 ftyers

Since the algorithm is already defined in the paper, this would be a matter of using the same for Ittoolbox, correct? Or are there some additional factors you would need in the implementation?

anjalibhavan avatar Feb 27 '20 12:02 anjalibhavan

@anjalibhavan well, the first version would be just the algorithm as described in the paper. Later the tool would support weighting lttoolbox transducers according to the vocabulary of the tool.

ftyers avatar Mar 06 '20 15:03 ftyers

The code is implemented in Python by https://github.com/rsennrich/subword-nmt

ftyers avatar Jun 20 '20 07:06 ftyers