super-gradients icon indicating copy to clipboard operation
super-gradients copied to clipboard

Error: 'charmap' codec can't encode characters

Open T0T4R4 opened this issue 1 year ago • 20 comments

Hi !

I'm following your tutorial for fine-tuning, but received the following error during training on my GPU :

  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1273, in train
    validation_results_tuple = self._validate_epoch(epoch=epoch, silent_mode=silent_mode)
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1759, in _validate_epoch
    return self.evaluate(
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1870, in evaluate
    sg_trainer_utils.display_epoch_summary(
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\super_gradients\training\utils\sg_trainer_utils.py", line 257, in display_epoch_summary
    summary_tree.show()
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\treelib\tree.py", line 854, in show
    print(self._reader)
  File "C:\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 20-22: character maps to <undefined>

I tried to just comment the show() function, but at then end of the whole training i got that error :

Exception ignored in atexit callback: <function reset_all at 0x000001DE4A6357E0>
Traceback (most recent call last):
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\colorama\initialise.py", line 34, in reset_all
    AnsiToWin32(orig_stdout).reset_all()
  File "C:\Users\clert\Documents\_Dev\yolo-nas\venv\lib\site-packages\colorama\ansitowin32.py", line 189, in reset_all
    self.wrapped.write(Style.RESET_ALL)
  File "C:\Python310\lib\codecs.py", line 378, in write
    self.stream.write(data)
TypeError: write() argument must be str, not bytes

Environment:

  • OS : Windows 11
  • GPU : RTX 2060
  • Super Gradients version 3.1.1
  • Python 3.10, environment :
 absl-py==1.4.0
alabaster==0.7.13
antlr4-python3-runtime==4.9.3
attrs==23.1.0
Babel==2.12.1
boto3==1.26.126
botocore==1.29.126
build==0.10.0
cachetools==5.3.0
certifi==2022.12.7
chardet==4.0.0
charset-normalizer==3.1.0
click==8.1.3
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.0.7
coverage==5.3.1
cycler==0.10.0
Deprecated==1.2.13
docutils==0.17.1
einops==0.3.2
flatbuffers==23.3.3
fonttools==4.39.3
future==0.18.3
google-auth==2.17.3
google-auth-oauthlib==1.0.0
grpcio==1.54.0
humanfriendly==10.0
hydra-core==1.3.2
idna==2.10
imagesize==1.4.1
Jinja2==3.1.2
jmespath==1.0.1
json-tricks==3.16.1
jsonschema==4.17.3
kiwisolver==1.4.4
Markdown==3.4.3
markdown-it-py==2.2.0
MarkupSafe==2.1.2
matplotlib==3.7.1
mdurl==0.1.2
mpmath==1.3.0
numpy==1.23.0
oauthlib==3.2.2
omegaconf==2.3.0
onnx==1.13.0
onnx-simplifier==0.4.28
onnxruntime==1.13.1
opencv-python==4.7.0.72
packaging==23.1
pandas==2.0.1
Pillow==9.5.0
pip-tools==6.13.0
protobuf==3.20.3
psutil==5.9.5
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycocotools==2.0.4
pyDeprecate==0.3.2
Pygments==2.15.1
pyparsing==2.4.5
pyproject_hooks==1.0.0
pyreadline3==3.4.1
pyrsistent==0.19.3
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3
PyYAML==6.0
rapidfuzz==3.0.0
requests==2.30.0
requests-oauthlib==1.3.1
requests-toolbelt==1.0.0
rich==13.3.5
roboflow==1.0.8
rsa==4.9
s3transfer==0.6.0
scipy==1.10.1
seaborn==0.12.2
sentry-sdk==1.22.2
six==1.16.0
snowballstemmer==2.2.0
Sphinx==4.0.3
sphinx-rtd-theme==1.2.0
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stringcase==1.2.0
super-gradients @ https://files.pythonhosted.org/packages/16/5b/a3e31ec12a6ce662ed8275f56cc9435a12d0d868109e24703a823ea88581/super_gradients-3.1.1-py3-none-any.whl#sha256=287476390285c31b69dbbbe1d45fe9f1bb654106f1a7403dd4500a3fd53a0294
sympy==1.11.1
tensorboard==2.12.3
tensorboard-data-server==0.7.0
termcolor==1.1.0
thop==0.1.1.post2209072238
tomli==2.0.1
torch @ file:///C:/Users/clert/Downloads/torch-1.13.1%2Bcu117-cp310-cp310-win_amd64.whl#sha256=978239684c6ec455ad2157ff33d44fdb9dd8d3a93b9d2f4ac7aa57691e990136
torch-tb-profiler==0.4.1
torchaudio @ file:///C:/Users/clert/Documents/_Dev/yolo-nas/torchaudio-0.13.1%2Bcu117-cp310-cp310-win_amd64.whl#sha256=2d821e7da413b193ed9acf59c9a4d2ae8704df4c6ff722da0fee77f569b11703
torchmetrics==0.8.0
torchvision @ file:///C:/Users/clert/Documents/_Dev/yolo-nas/torchvision-0.14.1%2Bcu117-cp310-cp310-win_amd64.whl#sha256=b39fc67e7131053d435804d7901e88528611c0832fd9f1cc26476b5a27cc5d81
tqdm==4.65.0
treelib==1.6.1
typing_extensions==4.5.0
tzdata==2023.3
ultralytics==8.0.99
urllib3==1.26.15
Werkzeug==2.3.3
wget==3.2
wrapt==1.15.0

T0T4R4 avatar May 14 '23 10:05 T0T4R4

Hi @T0T4R4 Can you please add the snippet of code ?

Meanwhile, you can try:

training_hyperparams['silent_mode'] = True
Trainer.train(..., training_hyperparams)

Louis-Dupont avatar May 14 '23 11:05 Louis-Dupont

Should I follow up the discussion on DashHub or here ?

Thanks @Louis-Dupont , looks like by changing the silent_mode back to True (I had it explicitely on False to follow with what happens while training and estimate the time it takes), the code ended gracefully. So thanks for the tip ! 👍

Still, it would be nice to have it working while disabled :)

T0T4R4 avatar May 14 '23 17:05 T0T4R4

Let's continue here, it's more convenient for people to see the discussion :)

Definitely, I am just not fully sure what causes it. If you have a few seconds to run the following code it would help me understand what happens:

from treelib import Tree

train_tree = Tree()
train_tree.create_node("Training", "Training")

summary_tree = Tree()
summary_tree.create_node("MAIN", "Summary")
summary_tree.paste("Summary", train_tree)
summary_tree.show()
from termcolor import colored

print(colored("Training", color="green"))
print("↗")

And everything together

from treelib import Tree
from termcolor import colored

train_tree = Tree()
train_tree.create_node(colored("Training ↗", color="green"), "Training")

summary_tree = Tree()
summary_tree.create_node("MAIN", "Summary")
summary_tree.paste("Summary", train_tree)
summary_tree.show()

In theory, the last one should fail, and hopefully we can isolate which steps leads to it with the first 3 tests. If it's the color or the arrow, we can simply add an option to deactivate it. If it's the tree library, then it's a bit more work because we would need to find an alternative way to display the results.

Louis-Dupont avatar May 15 '23 17:05 Louis-Dupont

Another thing you can try is to set the environment variable PYTHONIOENCODING=utf8

Louis-Dupont avatar May 16 '23 07:05 Louis-Dupont

Thanks @Louis-Dupont there is no problem with the display of trees, I ran your piece of code successfully. Will try the env var during my next training...

T0T4R4 avatar May 16 '23 08:05 T0T4R4

Thanks! 🙏 If you can also run this and share the result: import sys; print(sys.getdefaultencoding()

Louis-Dupont avatar May 16 '23 08:05 Louis-Dupont

Screenshot (13)

UnicodeEncodeError: 'charmap' codec can't encode characters in position 21-23: character maps to

Python = 3.10 cuda=11.7 pytorch = 1.13.1

Satyajit1993 avatar May 17 '23 03:05 Satyajit1993

Thanks! 🙏 If you can also run this and share the result: import sys; print(sys.getdefaultencoding()

it's already UTF-8

T0T4R4 avatar May 19 '23 08:05 T0T4R4

Hey all! I ran into a similar issue to @T0T4R4 earlier today. I'm running within a similar environment (torch = 1.13.1+cu117, super-gradients=3.1.1, python=3.10.10). I did a bit of tinkering and wanted to share some results, as well as possible workaround for now.

First off, I tried setting environment variable PYTHONIOENCONDING = 'utf8' and PYTHONUTF8 = 1 independently, but neither of these seemed to work. I then tried the treelib snippets from @Louis-Dupont. I was able to pass each of these test cases, however, I found that this only worked if I ran them completely separate from the original trainer.train(). Essentially, if trainer.train() had been called in a previous notebook cell and failed, cases 1 and 3 of the treelib snippets would also fail with a similar UnicodeEncodeError (complained about positions 6-8 instead of 20-22).

Then I went to try modifying the source code on my own copy of the super-gradients package. Commenting out summary_tree.show() did allow the training and validation to finish successfully, but obviously, I was unable to view the output. (NOTE: I did not run into the secondary error about a TypeError as experienced by @T0T4R4)

To get the function to work without error and with silent_mode = False (so I could view the output), I added the statement sys.stdout = sys.__stdout__ before summary_tree.show() instead of commenting it out (inspired by the fix here: #1021 - a different issue I was having, but possibly related). This allowed the training and validation to finish successfully and show the treelib output.

FWIW It appears to me that the encoding is being switched to cp1252 from utf8 somewhere along the way, but it not successfully converted back to utf8 before summary_tree.show(). However, I don't know much at all about charsets so take this with a grain of salt. Hope something in here helps!

sewty avatar May 19 '23 20:05 sewty

Hi @sewty , you described exactly what we both did ;) and what I think as well happens !! Pretty much at this stage, I'm just commenting out the call to show() as well 😅

Note that I also tried to change all calls to open a file by specifying the UTF8 encoding, and that didn't make any difference... On my end, the default encoding is UTF-8 anyway...

T0T4R4 avatar May 19 '23 21:05 T0T4R4

Yea @T0T4R4 its strange that neither setting environment variables nor changing all calls to open a file as you said aren't maintaining the default encoding. To be clear, my final solution did not have summary_tree.show() commented out. It looks like this:

At the very end of function display_epoch_summary in sg_trainer_utils.py:

summary_tree = Tree() summary_tree.create_node(f"SUMMARY OF EPOCH {epoch}", "Summary") summary_tree.paste("Summary", train_tree) summary_tree.paste("Summary", valid_tree) sys.stdout = sys.__stdout__ summary_tree.show()

I would give this a try, along with silent_mode=False if you haven't already. The idea here is to manually set the encoding to 'utf-8' with sys.stdout = sys.__stdout__ before the call to summary_tree.show() which doesn't like the cp1252 encoding you're seeing (if I understand correctly).

sewty avatar May 19 '23 21:05 sewty

ok so my default encoding is actually 1252....

import locale
print( locale.getpreferredencoding())

returns cp1252

For Py 3.10.... I can read "Python opens source files as UTF-8 by default, but any interaction with the filesystem will depend on the environment. It's strongly recommended to use open(filename, encoding='utf-8') to read a file." so my initial approach trying to update all calls to open() to add UTF-8 might be the way...

For now I have edited treelib's tree.py on line 930 to force the UTF-8 encoding, and it passed.

if stdout:
    import sys
    sys.stdout.reconfigure(encoding='utf-8')
    print(self._reader)
else:
    return self._reader

Edit: investigating up the 🪜 ladder, moving those 2 lines to sg_trainer_utils.py in _display_epoch_summary works as well. And at least does not clutter treeelib.

T0T4R4 avatar May 19 '23 21:05 T0T4R4

Ok I'm on this stuff since 2 hours 😅

As soon as the module initializes, it changes the charset to cp1252....

import sys
print(sys.stdout.encoding)

returns utf-8

from super_gradients.training import Trainer
print(sys.stdout.encoding)

returns cp1252

...🤔

T0T4R4 avatar May 19 '23 22:05 T0T4R4

After having spent so much time in the logging block.... it wasn't there 😅 !!

Just found the culprit:

common/abstractions/mute_processes.py

line 30 , mute_current_process()

must add encoding when opening a file 😁

sys.stdout = open(os.devnull, "w") to sys.stdout = open(os.devnull, "w", encoding="utf-8")

T0T4R4 avatar May 19 '23 22:05 T0T4R4

PR pending @Louis-Dupont 🙂

T0T4R4 avatar May 19 '23 23:05 T0T4R4

@T0T4R4 @sewty I merged the fix to master, does that completely fix the treelib issue?

Louis-Dupont avatar May 22 '23 13:05 Louis-Dupont

@Louis-Dupont 'charmap' codec can't encode characters error is solved.

Thanks for the fix.

Satyajit1993 avatar May 24 '23 03:05 Satyajit1993

@Louis-Dupont , I just ran into the same error. However I do get an error when I run the snippet you provided.

import sys; 
sys.getdefaultencoding()
'utf-8'

image

mazatov avatar Jun 02 '23 09:06 mazatov

@mazatov - pls check


import locale
locale.getdefaultencoding()

This will give _cp152_

skyprince999 avatar Jun 11 '23 17:06 skyprince999

Fixed in 3.1.2

BloodAxe avatar Aug 10 '23 10:08 BloodAxe