stanford-tensorflow-tutorials icon indicating copy to clipboard operation
stanford-tensorflow-tutorials copied to clipboard

how to use ubuntu dialog corpus

Open bandarikanth opened this issue 7 years ago • 1 comments

Hi, We have successfully trained stanford chatbot using cornell movie dialog corpus.But it is giving random answers.We are trying to use Ubuntu Dialog Corpus dataset but we are unable to pre-process it .How can we change the format similar to cornell movie dialog corpus. Otherwise,Can you please suggest any other dataset which is similar to cornell movie dialog corpus.

Thanks in advance.

bandarikanth avatar Oct 12 '17 05:10 bandarikanth

You can find very well-organized and cleaned conversational dataset (about 160K pairs) for training a chatbot here: https://github.com/bshao001/ChatLearner. That repository also contains scripts and instructions to preprocess reddit data in case you need more (such as million pairs).

bshao001 avatar Nov 10 '17 13:11 bshao001