examples icon indicating copy to clipboard operation
examples copied to clipboard

world_language_model example throws UnicodeEncodeError

Open miebster opened this issue 1 year ago • 0 comments

Your issue may already be reported! Please search on the issue tracker before creating one.

Context

  • Python 3.11
  • Pytorch version: 2.1.1
  • Operating System and version: Windows 10 Pro, 22H2, 19045.3693

Your Environment

  • torch installed via "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121"
  • pytorch_examples cloned from github
  • Using CUDA/GPU
  • Which example are you using: word_language_model
  • Link to code or data to repro [if any]: https://github.com/pytorch/examples/tree/main/word_language_model

Expected Behavior

Following the commands in the world_language_model readme should finish without error.

Current Behavior

During generate.py, an UnicodeEncodeError is thrown when trying to write 'ზ' to the file.

Possible Solution

I resolved the issue by changing line 66 of generate.py

from: with open(args.outf, 'w') as outf:

to: with open(args.outf, 'w', encoding="utf-8") as outf:

Steps to Reproduce

cd .\word_language_model
python main.py --cuda --epochs 6 python generate.py

Failure Logs [if any]

| Generated 0/1000 words | Generated 100/1000 words | Generated 200/1000 words | Generated 300/1000 words | Generated 400/1000 words | Generated 500/1000 words Traceback (most recent call last): File "REDACTED\word_language_model\generate.py", line 83, in outf.write(word + ('\n' if i % 20 == 19 else ' ')) File "REDACTED\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u10d6' in position 0: character maps to

miebster avatar Dec 02 '23 04:12 miebster