ofxMSATensorFlow icon indicating copy to clipboard operation
ofxMSATensorFlow copied to clipboard

Original corpora from which char-rnn models were created

Open dhowe opened this issue 6 years ago • 1 comments

is this available somewhere (either as links or download)? thnx

dhowe avatar Jun 19 '18 12:06 dhowe

I didn't get round to posting the original corpora, still on my todo list, but in the meantime you can find most of it on http://www.gutenberg.org/. (though you may have to clean the files a bit, to remove the erroneous characters and disclaimers).

There are a number of Trump corpora around, e.g. http://www.thegrammarlab.com/?nor-portfolio=corpus-of-presidential-speeches-cops-and-a-clintontrump-corpus and https://github.com/ryanmcdermott/trump-speeches

Linux kernel source code at https://github.com/torvalds/linux

love song lyrics I scraped from a number of websites (can't remember which ones right now, will try and dig it up).

memo avatar Jun 19 '18 14:06 memo