(Windows-python) semgrep ver. 1.140.0 scan fail - "charmap codec can't encode character"
Describe the bug
OS: Windows 11 Home x64 Python version: 3.13.7 Semgrep version: 1.140.0
After install with python -m pip install semgrep and create the configuration with SEMGREP_APP_TOKEN=<token> semgrep login, I tried to execute semgrep scan and I got the following error:
[00.08][WARNING](ca-certs): Ignored 1 trust anchors.
┌──── ○○○ ────┐
│ Semgrep CLI │
└─────────────┘
'charmap' codec can't encode character '\u202a' in position 1377874: character maps to <undefined>
Traceback (most recent call last):
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\commands\wrapper.py", line 51, in wrapper
func(*args, **kwargs)
~~~~^^^^^^^^^^^^^^^^^
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\commands\scan.py", line 947, in scan
) = semgrep.run_scan.run_scan(
~~~~~~~~~~~~~~~~~~~~~~~~~^
dump_command_for_core=dump_command_for_core,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<46 lines>...
x_group_taint_rules=x_group_taint_rules,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
return f(*args, **kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\run_scan.py", line 1103, in run_scan
configs_obj, config_errors = get_config(
~~~~~~~~~~^
pattern,
^^^^^^^^
...<4 lines>...
no_rewrite_rule_ids=no_rewrite_rule_ids,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
return f(*args, **kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 1027, in get_config
config, errors = Config.from_config_list(config_strs, project_url)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
return f(*args, **kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 619, in from_config_list
resolved_config, config_errors = resolve_config(config, project_url)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
return f(*args, **kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 532, in resolve_config
config, errors = parse_config_files(config_loader.load_config())
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
return f(*args, **kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 507, in parse_config_files
config_data, config_errors = future.result()
~~~~~~~~~~~~~^^
File "C:\Python313\Lib\concurrent\futures\_base.py", line 449, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "C:\Python313\Lib\concurrent\futures\_base.py", line 401, in __get_result
raise self._exception
File "C:\Python313\Lib\concurrent\futures\thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 467, in context_aware_parse_config_string
return parse_config_string(*args, **kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
return f(*args, **kwargs)
File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 832, in parse_config_string
fp.write(contents)
~~~~~~~~^^^^^^^^^^
File "C:\Python313\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u202a' in position 1377874: character maps to <undefined>
I tried to execute the semgrep with exported (system) env PYTHONIOENCODING=utf-8 but I took the same fail.
To Reproduce Described above
Expected behavior Run the scan without error.
Screenshots N/A
What is the priority of the bug to you?
- [x] P0: blocking your adoption of Semgrep or workflow
- [ ] P1: important to fix or quite annoying
- [ ] P2: regular bug that should get fixed
Environment Semgrep.dev
Use case Test the tool to consider its adoption
thanks for the report! This looks like a windows specific issue, we'll take a closer look
Try running this in the Powershell terminal right before the semgrep command, has worked for me when I had the same error:
$env:PYTHONUTF8="1"
As @thomas-ncc suggested, try setting the PYTHONUTF8 environment variable. Our Windows installation instructions also include that instruction. This will no longer be necessary as of Python 3.15. Please let us know if that works!