makemore icon indicating copy to clipboard operation
makemore copied to clipboard

Added --input-file-encoding as a command line argument

Open JohanNorberg opened this issue 2 years ago • 1 comments

I wanted to train the program on making more Swedish names. They contain special characters like Å and Ö, so I need to read the file using utf-8. On windows (at least on my machine) this is a problem since default encoding is cp1252, so it doesn't work. So I added a command line argument so I can specify the encoding.

Wrong python .\makemore.py -i .\swe_names.txt -o swe_names

number of unique characters in the vocabulary: 55
vocabulary:
 -ABCDEFGHIJKLMNOPRSTUVWYabcdefghijklmnopqrstuvxy¥©¶Ã–…

Correct python .\makemore.py -i .\swe_names.txt -o swe_names --input-file-encoding utf-8

number of unique characters in the vocabulary: 54
vocabulary:
 -ABCDEFGHIJKLMNOPRSTUVWYabcdefghijklmnopqrstuvxyÅÖåéö

swe_names.txt

Btw, watching all of your videos on YT, they are great!

JohanNorberg avatar Jul 22 '23 22:07 JohanNorberg

I agree. The first thing I did when experimenting with makemore was adding that option to let it generate French words.

thbz avatar Apr 06 '24 16:04 thbz