malaysian-dataset icon indicating copy to clipboard operation
malaysian-dataset copied to clipboard

preparing abstractive normalization

Open huseinzol05 opened this issue 2 years ago • 0 comments

rules based normalization bahasa -> ms-en noisy trained translation -> standard en -> en-ms translation.

  1. rules based normalization bahasa from malaya.normalize.
  2. ms-en noisy model, google translate is really good enough.
  3. en-ms translation, somehow based on test set, en-ms translation from malaya is more acceptable for us.

huseinzol05 avatar Aug 21 '22 08:08 huseinzol05