meena-chatbot
meena-chatbot copied to clipboard
Where is the source file on the nlpl page exactly?
The notebook references http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.it.gz
as the source, when I visit the linked opus.nlpl.eu page I see this grid with a bunch of LANG.xml.gz files - I cannot seem to locate a different file than Italian - can you link me to the exact page where I can find alternatives to Italian language so that I can train the model with a different data source please?
https://opus.nlpl.eu/OpenSubtitles-v2018.php is the page with all the conversational dataset provided by OpenSubtitles. Look for the first row in the second table, corresponding to the monolingual plain text files (tokenized).