2017
2017 copied to clipboard
Intent to participate [First lines of novels]
A tiny dataset produced mixed results in my first attempt to generate the first sentence of a novel http://aiweirdness.com/post/167049313837/a-neural-network-tries-writing-the-first-sentence
Highlights:
- There was a man and he had seventy first sight.
- It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of my life, fire of my loins. Lowlights:
- Stop! I caused the Narguuse man who was new on Alabama, the screaming constipated eggs.
- I am an angry grass, the symposium square, proved fatal to the throbbing, the howling wind tire…
The really big repositories I've found (Project Gutenburg, for example) are formatted inconsistently enough that they're difficult to scrape.
So now I'm crowdsourcing a larger dataset: https://docs.google.com/forms/d/e/1FAIpQLScod8P-kcLX98u6gT0rX6-20GwkDo_glz-okVVkrhr6KgQONQ/viewform. This has been posted for about 36 hours and already has 3532 submissions (not all unique). People are welcome to contribute through this form - or let me know if you have a smarter way to contribute a dataset.
At the end of the month, I'll try again with a hopefully much larger dataset, and post the results and dataset afterwards, as well as a link to whatever open-source package I end up using. It won't produce a full novel in the traditional sense, but I'll declare a moral victory if a human announces their admiration of one of the neural network's lines.
Marking this one complete! Big thanks to everyone who contributed to the dataset.
Writeup and highlights here: http://aiweirdness.com/post/168051907512/the-first-line-of-a-novel-by-an-improved-neural
I ended up using a syll-rnn (lstm mode) to do the generation, which ran for about 16 hours on my Macbook. Syll-rnn seems to be better at larger datasets than char-rnn, yet can handle a larger vocabulary than word-rnn. Here's the framework I used:
https://github.com/learningtitans/torch-rnn/blob/valle-syllables/doc/flags.md#preprocessing
Sequence length was 40 syllables (based roughly on the number of syllables in "It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife." LSTM size is 512, 3 layers (based on what would fit on my computer; I'm running a 1064-size LSTM now but it's taking a long time and it's not clear that the results will be any better).
140,000 words of output available here. Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.
https://github.com/janelleshane/novel-first-lines-dataset/blob/master/output_checkpoint10000_temp0p6.txt
Crowdsourced dataset available here: https://github.com/janelleshane/novel-first-lines-dataset
(We're using issues as a sort of forum, so I'll re-open this to make it easier to find.)
Good stuff!
Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.
I think the eternal sand is quite appropriate for NaNoGenMo!
As a way at the ground, and the cat could have been in the town and a shock and the type on the back of the pilsage and belched and the color of the great little person who was still and the imface of the decoction of the heat between the box against the three interesting seament and the eternal sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand ...
Thanks for clearing that up! And for adding the completed tag!
Yes, eternal sand. People have been making Star Wars jokes at me all day.