punctuator2 icon indicating copy to clipboard operation
punctuator2 copied to clipboard

Preprocessing scripts for TED dataset

Open seokhwankim opened this issue 7 years ago • 3 comments

Could you possibly add the preprocessing scripts for TED dataset? It would help to reproduce the results on your Interspeech paper.

seokhwankim avatar Jul 16 '18 20:07 seokhwankim

i want the preprocessing scripts too, because i trained a model on training data divided by myself, and got a worse result than author's.

MingLunHan avatar Mar 25 '19 07:03 MingLunHan

The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta

ottokart avatar Mar 25 '19 08:03 ottokart

The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta

thank you very much

MingLunHan avatar Mar 25 '19 09:03 MingLunHan