semgrep icon indicating copy to clipboard operation
semgrep copied to clipboard

Semgrep tests fail with `Unsupported variant "Json"`

Open kam193 opened this issue 11 months ago • 4 comments

Describe the bug I have recently updated my local Semgrep version, and when running semgrep scan --test on my ruleset, Semgrep started to failing with the following issues "live" in the console:

⠙ Loading rules...RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"Json\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from Semgrep_output_v1_j.read_function_call.(fun) in file "OSS/src/rule/semgrep_output_v1_j.ml", line 28657, characters 26-107
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 123, characters 13-67

And the following Python error at the end:

        signatures/infostealers.yaml: Traceback (most recent call last):
  File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/test.py", line 300, in invoke_semgrep_multi
    output = semgrep.run_scan.run_scan_and_return_json(config, targets, **kwargs)
  File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/run_scan.py", line 1034, in run_scan_and_return_json
    return json.loads(outputs[0][1])  # type: ignore
  File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The issue seems to be repeated for every semgrep rules file I have. On the same time, scanning using the same rule files works without any troubles.

To Reproduce For me, it looks like any semgrep scan --test run using versions 1.99 or 1.101 (I've tested those two) results in the issue. Unfortunately, I don't have a minimal reproducible rule file to share at the moment. Everything works well in 1.94.

Expected behavior Tests are run.

Screenshots If applicable, add screenshots to help explain your problem.

What is the priority of the bug to you?

  • [ ] P0: blocking your adoption of Semgrep or workflow
  • [x] P1: important to fix or quite annoying
  • [ ] P2: regular bug that should get fixed

Environment I use Semgrep installed via PyPI

Use case What will fixing this bug enable for you?

Without fixing, I'll keep using Semgrep 1.94 locally.

kam193 avatar Jan 04 '25 19:01 kam193

I'll be happy to fix this if you can attach some example test files (both the rule and the test target).

aryx avatar Jan 09 '25 07:01 aryx

Hey, thanks for the answer, and sorry for the delay - I didn't have time to look at it. Now, it looks like I've found a reproducible code, and more over, it's not only about the tests, but generally the JSON output. I find it extremely weird and also try to find out, if there is anything in my environment that could influence it.

So, the rule:

rules:
  - id: exec-usage
    pattern-either:
      - pattern: eval(...)
      - pattern: exec(...)
    message: eval/exec usage
    severity: INFO
    languages:
      - python

code:

# ruleid: exec-usage
exec("aaaa")

And the runs:

$ semgrep --version
1.107.0
$  semgrep scan -c min.yaml min.py
...
Ran 1 rule on 1 file: 1 finding.
$ semgrep scan --json -c min.yaml min.py
...                                                                                                                        
RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"Json\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from Semgrep_output_v1_j.read_function_call.(fun) in file "OSS/src/rule/semgrep_output_v1_j.ml", line 28657, characters 26-107
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 123, characters 13-67

<ERROR: missing output>
...
Ran 1 rule on 1 file: 1 finding.
$ semgrep scan --test -c min.yaml min.py
                                                                                                                        
...                                                                                                                        
RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"Json\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from Semgrep_output_v1_j.read_function_call.(fun) in file "OSS/src/rule/semgrep_output_v1_j.ml", line 28657, characters 26-107
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 123, characters 13-67

No unit tests found. See https://semgrep.dev/docs/writing-rules/testing-rules
No tests for fixes found.
--------------------------------------------------------------------------------
The following config files produced errors:
        min.yaml: Traceback (most recent call last):
  File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/test.py", line 300, in invoke_semgrep_multi
    output = semgrep.run_scan.run_scan_and_return_json(
  File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/run_scan.py", line 1085, in run_scan_and_return_json
    return json.loads(outputs[0][1])  # type: ignore
  File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
$ pip install "semgrep==1.94.0"
...
$ semgrep --version
1.94.0

⏫ A new version of Semgrep is available. See https://semgrep.dev/docs/upgrading
$ semgrep scan --test -c min.yaml min.py
...                                                                                                                     
1/1: ✓ All tests passed
No tests for fixes found.
$ semgrep scan --json -c min.yaml min.py
...                                                                                                                     
{"errors": [], "interfile_languages_used": [], "paths": {"scanned": ["min.py"]}, "results": [{"check_id": "exec-usage", "end": {"col": 13, "line": 2, "offset": 33}, "extra": {"engine_kind": "OSS", "fingerprint": "77f096e04b4fe7aa8378ad4101772d9d8fe65b8519e191601ba524e71016c71f96f3796295db0adfc2e69ca345d40040237222744ecaca866f19d6acb4636e85_0", "is_ignored": false, "lines": "exec(\"aaaa\")", "message": "eval/exec usage", "metadata": {}, "metavars": {}, "severity": "INFO", "validation_state": "NO_VALIDATOR"}, "path": "min.py", "start": {"col": 1, "line": 2, "offset": 21}}], "skipped_rules": [], "version": "1.94.0"}
...

Ran 1 rule on 1 file: 1 finding.

⏫ A new version of Semgrep is available. See https://semgrep.dev/docs/upgrading

A little about my environment:

$ python --version
Python 3.10.8
$ pip freeze
attrs==23.2.0
boltons==21.0.0
bracex==2.4
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
click-option-group==0.5.6
colorama==0.4.6
defusedxml==0.7.1
Deprecated==1.2.14
exceptiongroup==1.2.1
face==22.0.0
glom==22.1.0
googleapis-common-protos==1.63.2
idna==3.7
importlib_metadata==7.1.0
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
markdown-it-py==3.0.0
mdurl==0.1.2
opentelemetry-api==1.25.0
opentelemetry-exporter-otlp-proto-common==1.25.0
opentelemetry-exporter-otlp-proto-http==1.25.0
opentelemetry-instrumentation==0.46b0
opentelemetry-instrumentation-requests==0.46b0
opentelemetry-proto==1.25.0
opentelemetry-sdk==1.25.0
opentelemetry-semantic-conventions==0.46b0
opentelemetry-util-http==0.46b0
packaging==24.0
peewee==3.17.5
protobuf==4.25.4
Pygments==2.18.0
PyYAML==6.0.2
referencing==0.35.1
requests==2.32.2
rich==13.5.3
rpds-py==0.18.1
ruamel.yaml==0.17.40
ruamel.yaml.clib==0.2.8
semgrep==1.94.0
tomli==2.0.1
typing_extensions==4.12.0
urllib3==2.2.1
wcmatch==8.5.2
wrapt==1.16.0
zipp==3.19.2

I mean, it looks like really related to the Semgrep version, but I cannot imagine that no one else noticed issues with JSON output or tests so far, so I have some suspicious that something on my machine influences the results.

kam193 avatar Feb 08 '25 21:02 kam193

Ehh, I'm almost sure it's something with my environment. It works in a clean Docker container. I'm leaving it open because you may have some idea what's going on, but given that it works in a container even when installing all dependency packages in versions as listed above, my environment has to be totally broken.

kam193 avatar Feb 08 '25 21:02 kam193

I just ran into a similar issue:

RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"CallGetTargets\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 96, characters 13-67

Failed to obtain target files from semgrep-core

However, I never got a json.decoder.JSONDecodeError. The semgrep command just silently failed to produce results even in files with known findings (e.g. rule tests). I suspect it has something to do with the Semgrep Pro semgrep-core binary. Could be a version mismatch between what semgrep is expecting and what the core binary is producing 🤷.

Anyway, I was able to fix it by removing semgrep-core-proprietary. The weird part is I wasn't using an Pro features, so I'm not sure why that binary would be invoked, and causing issues.

mschwager avatar Aug 19 '25 14:08 mschwager