students icon indicating copy to clipboard operation
students copied to clipboard

Include non-breaking prefixes file for source language

Open kpu opened this issue 4 years ago • 4 comments

Currently bergamot-translator is just not loading non-breaking prefixes https://github.com/browsermt/bergamot-translator/issues/104 . This is bad and should be fixed. I think the clean way to do this is to ship the file for the source language. They're small enough that some copying is probably ok.

kpu avatar May 09 '21 10:05 kpu

Can you bring the relevant nonbreaking_prefixes.xx into the archive, @XapaJIaMnu. I'll pick this up at BRT to include tests for https://github.com/browsermt/bergamot-translator/pull/172.

jerinphilip avatar May 27 '21 10:05 jerinphilip

Where exactly do we get those from? Is that part off ssplit, @ugermann ?

XapaJIaMnu avatar May 27 '21 12:05 XapaJIaMnu

They come from moses. https://github.com/moses-smt/mosesdecoder/tree/master/scripts/share/nonbreaking_prefixes

kpu avatar May 27 '21 12:05 kpu

They actually ship with the sentence splitter and may diverge from Moses over time, as we add additional prefixes.

ugermann avatar May 27 '21 13:05 ugermann