semgrep icon indicating copy to clipboard operation
semgrep copied to clipboard

(Windows-python) semgrep ver. 1.140.0 scan fail - "charmap codec can't encode character"

Open trombini77 opened this issue 2 months ago • 3 comments

Describe the bug

OS: Windows 11 Home x64 Python version: 3.13.7 Semgrep version: 1.140.0

After install with python -m pip install semgrep and create the configuration with SEMGREP_APP_TOKEN=<token> semgrep login, I tried to execute semgrep scan and I got the following error:

[00.08][WARNING](ca-certs): Ignored 1 trust anchors.

┌──── ○○○ ────┐
│ Semgrep CLI │
└─────────────┘

'charmap' codec can't encode character '\u202a' in position 1377874: character maps to <undefined>
Traceback (most recent call last):
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\commands\wrapper.py", line 51, in wrapper
    func(*args, **kwargs)
    ~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\commands\scan.py", line 947, in scan
    ) = semgrep.run_scan.run_scan(
        ~~~~~~~~~~~~~~~~~~~~~~~~~^
        dump_command_for_core=dump_command_for_core,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<46 lines>...
        x_group_taint_rules=x_group_taint_rules,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
    return f(*args, **kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\run_scan.py", line 1103, in run_scan
    configs_obj, config_errors = get_config(
                                 ~~~~~~~~~~^
        pattern,
        ^^^^^^^^
    ...<4 lines>...
        no_rewrite_rule_ids=no_rewrite_rule_ids,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
    return f(*args, **kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 1027, in get_config
    config, errors = Config.from_config_list(config_strs, project_url)
                     ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
    return f(*args, **kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 619, in from_config_list
    resolved_config, config_errors = resolve_config(config, project_url)
                                     ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
    return f(*args, **kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 532, in resolve_config
    config, errors = parse_config_files(config_loader.load_config())
                     ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
    return f(*args, **kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 507, in parse_config_files
    config_data, config_errors = future.result()
                                 ~~~~~~~~~~~~~^^
  File "C:\Python313\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "C:\Python313\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Python313\Lib\concurrent\futures\thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 467, in context_aware_parse_config_string
    return parse_config_string(*args, **kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\tracing.py", line 303, in inner
    return f(*args, **kwargs)
  File "C:\Users\leand\AppData\Roaming\Python\Python313\site-packages\semgrep\config_resolver.py", line 832, in parse_config_string
    fp.write(contents)
    ~~~~~~~~^^^^^^^^^^
  File "C:\Python313\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u202a' in position 1377874: character maps to <undefined>

I tried to execute the semgrep with exported (system) env PYTHONIOENCODING=utf-8 but I took the same fail.

To Reproduce Described above

Expected behavior Run the scan without error.

Screenshots N/A

What is the priority of the bug to you?

  • [x] P0: blocking your adoption of Semgrep or workflow
  • [ ] P1: important to fix or quite annoying
  • [ ] P2: regular bug that should get fixed

Environment Semgrep.dev

Use case Test the tool to consider its adoption

trombini77 avatar Oct 26 '25 01:10 trombini77

thanks for the report! This looks like a windows specific issue, we'll take a closer look

ajbt200128 avatar Oct 30 '25 20:10 ajbt200128

Try running this in the Powershell terminal right before the semgrep command, has worked for me when I had the same error:

$env:PYTHONUTF8="1"

thomas-ncc avatar Nov 03 '25 16:11 thomas-ncc

As @thomas-ncc suggested, try setting the PYTHONUTF8 environment variable. Our Windows installation instructions also include that instruction. This will no longer be necessary as of Python 3.15. Please let us know if that works!

nmote avatar Nov 18 '25 21:11 nmote