Anmol Joshi

Results 17 comments of Anmol Joshi

@zhangguanheng66 @cpuhrsch I have incorporated changes requested in an earlier review and made some additional changes. Here is a summary: - Removed any code related to extract_archive fix and moved...

Reading from the [website](http://www.statmt.org/wmt11/translation-task.html#download), 2009 is the largest dataset. > - From Europarl (403MB) md5 sha1 > - From the News Commentary corpus (41MB) md5 sha1 > - From the...

Is the idea to have multiple functions for different years' datasets or provide an argument for the year?

Should I update the current dataset to 2009? Which other datasets would you want to provide?

@zhangguanheng66 thanks for the comments. I've added an option where users can pass the year and a table in the docstrings with details about the news crawl datasets by year....

@zhangguanheng66 I saw discussion in #691 and #690, code in #696 - Is there value in decoupling vocab and LanguageModelingDataset as well?

Thanks! Let me know if any other changes are needed on this PR!