punctuator2 Preprocessing scripts for TED dataset

Could you possibly add the preprocessing scripts for TED dataset? It would help to reproduce the results on your Interspeech paper.

Jul 16 '18 20:07 seokhwankim

i want the preprocessing scripts too, because i trained a model on training data divided by myself, and got a worse result than author's.

Mar 25 '19 07:03 MingLunHan

The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta

Mar 25 '19 08:03 ottokart

The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta

thank you very much

Mar 25 '19 09:03 MingLunHan