practical-pytorch icon indicating copy to clipboard operation
practical-pytorch copied to clipboard

shakespeare.txt is not found

Open f2012444 opened this issue 7 years ago • 4 comments

shakespeare.txt file which is used in char-run-generation is not present in the data folder.

f2012444 avatar May 15 '18 23:05 f2012444

also came here for this. google shows me a few courses also loading text from shakespeare.txt - all examples link to a copy of shakespeare from project gutenburg, link below.

http://www.gutenberg.org/files/100/100-0.txt

I haven't tried loading this yet. likely will need one of the tools to clean up project gutenberg texts. (ie: remove headers etc)

aspiringguru avatar May 27 '18 07:05 aspiringguru

very rough cleanup. used chapterize for first pass cleanup of http://www.gutenberg.org/files/100/100-0.txt chapterize shakespear_all.txt --nochapters

then manually copied out the first book 'sonnets' then this ugly tool to strip blank lines and lines with ints. https://github.com/aspiringguru/practical-pytorch/blob/master/data/gutenberg_cleanup.py

resulting in this. (needs more cleanup. but eh, will see how the notebook copes. https://github.com/aspiringguru/practical-pytorch/blob/master/data/shakespear_sonnets_out.txt

I'm not proud of it. :)

aspiringguru avatar May 27 '18 09:05 aspiringguru

This file has the same length. From karpathy's repository. https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt

zehongs avatar May 27 '18 19:05 zehongs

👍 That's the one. As mentioned in the readme: https://github.com/spro/practical-pytorch/tree/master/char-rnn-generation

spro avatar May 27 '18 19:05 spro