students
students copied to clipboard
Include non-breaking prefixes file for source language
Currently bergamot-translator is just not loading non-breaking prefixes https://github.com/browsermt/bergamot-translator/issues/104 . This is bad and should be fixed. I think the clean way to do this is to ship the file for the source language. They're small enough that some copying is probably ok.
Can you bring the relevant nonbreaking_prefixes.xx into the archive, @XapaJIaMnu. I'll pick this up at BRT to include tests for https://github.com/browsermt/bergamot-translator/pull/172.
Where exactly do we get those from? Is that part off ssplit, @ugermann ?
They come from moses. https://github.com/moses-smt/mosesdecoder/tree/master/scripts/share/nonbreaking_prefixes
They actually ship with the sentence splitter and may diverge from Moses over time, as we add additional prefixes.