Semgrep tests fail with `Unsupported variant "Json"`
Describe the bug
I have recently updated my local Semgrep version, and when running semgrep scan --test on my ruleset, Semgrep started to failing with the following issues "live" in the console:
⠙ Loading rules...RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"Json\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from Semgrep_output_v1_j.read_function_call.(fun) in file "OSS/src/rule/semgrep_output_v1_j.ml", line 28657, characters 26-107
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 123, characters 13-67
And the following Python error at the end:
signatures/infostealers.yaml: Traceback (most recent call last):
File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/test.py", line 300, in invoke_semgrep_multi
output = semgrep.run_scan.run_scan_and_return_json(config, targets, **kwargs)
File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/run_scan.py", line 1034, in run_scan_and_return_json
return json.loads(outputs[0][1]) # type: ignore
File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The issue seems to be repeated for every semgrep rules file I have. On the same time, scanning using the same rule files works without any troubles.
To Reproduce
For me, it looks like any semgrep scan --test run using versions 1.99 or 1.101 (I've tested those two) results in the issue. Unfortunately, I don't have a minimal reproducible rule file to share at the moment. Everything works well in 1.94.
Expected behavior Tests are run.
Screenshots If applicable, add screenshots to help explain your problem.
What is the priority of the bug to you?
- [ ] P0: blocking your adoption of Semgrep or workflow
- [x] P1: important to fix or quite annoying
- [ ] P2: regular bug that should get fixed
Environment I use Semgrep installed via PyPI
Use case What will fixing this bug enable for you?
Without fixing, I'll keep using Semgrep 1.94 locally.
I'll be happy to fix this if you can attach some example test files (both the rule and the test target).
Hey, thanks for the answer, and sorry for the delay - I didn't have time to look at it. Now, it looks like I've found a reproducible code, and more over, it's not only about the tests, but generally the JSON output. I find it extremely weird and also try to find out, if there is anything in my environment that could influence it.
So, the rule:
rules:
- id: exec-usage
pattern-either:
- pattern: eval(...)
- pattern: exec(...)
message: eval/exec usage
severity: INFO
languages:
- python
code:
# ruleid: exec-usage
exec("aaaa")
And the runs:
$ semgrep --version
1.107.0
$ semgrep scan -c min.yaml min.py
...
Ran 1 rule on 1 file: 1 finding.
$ semgrep scan --json -c min.yaml min.py
...
RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"Json\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from Semgrep_output_v1_j.read_function_call.(fun) in file "OSS/src/rule/semgrep_output_v1_j.ml", line 28657, characters 26-107
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 123, characters 13-67
<ERROR: missing output>
...
Ran 1 rule on 1 file: 1 finding.
$ semgrep scan --test -c min.yaml min.py
...
RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"Json\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from Semgrep_output_v1_j.read_function_call.(fun) in file "OSS/src/rule/semgrep_output_v1_j.ml", line 28657, characters 26-107
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 123, characters 13-67
No unit tests found. See https://semgrep.dev/docs/writing-rules/testing-rules
No tests for fixes found.
--------------------------------------------------------------------------------
The following config files produced errors:
min.yaml: Traceback (most recent call last):
File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/test.py", line 300, in invoke_semgrep_multi
output = semgrep.run_scan.run_scan_and_return_json(
File "/home/kamil/.pyenv/versions/3.10.8/envs/semgrep-rules/lib/python3.10/site-packages/semgrep/run_scan.py", line 1085, in run_scan_and_return_json
return json.loads(outputs[0][1]) # type: ignore
File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/kamil/.pyenv/versions/3.10.8/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
$ pip install "semgrep==1.94.0"
...
$ semgrep --version
1.94.0
⏫ A new version of Semgrep is available. See https://semgrep.dev/docs/upgrading
$ semgrep scan --test -c min.yaml min.py
...
1/1: ✓ All tests passed
No tests for fixes found.
$ semgrep scan --json -c min.yaml min.py
...
{"errors": [], "interfile_languages_used": [], "paths": {"scanned": ["min.py"]}, "results": [{"check_id": "exec-usage", "end": {"col": 13, "line": 2, "offset": 33}, "extra": {"engine_kind": "OSS", "fingerprint": "77f096e04b4fe7aa8378ad4101772d9d8fe65b8519e191601ba524e71016c71f96f3796295db0adfc2e69ca345d40040237222744ecaca866f19d6acb4636e85_0", "is_ignored": false, "lines": "exec(\"aaaa\")", "message": "eval/exec usage", "metadata": {}, "metavars": {}, "severity": "INFO", "validation_state": "NO_VALIDATOR"}, "path": "min.py", "start": {"col": 1, "line": 2, "offset": 21}}], "skipped_rules": [], "version": "1.94.0"}
...
Ran 1 rule on 1 file: 1 finding.
⏫ A new version of Semgrep is available. See https://semgrep.dev/docs/upgrading
A little about my environment:
$ python --version
Python 3.10.8
$ pip freeze
attrs==23.2.0
boltons==21.0.0
bracex==2.4
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
click-option-group==0.5.6
colorama==0.4.6
defusedxml==0.7.1
Deprecated==1.2.14
exceptiongroup==1.2.1
face==22.0.0
glom==22.1.0
googleapis-common-protos==1.63.2
idna==3.7
importlib_metadata==7.1.0
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
markdown-it-py==3.0.0
mdurl==0.1.2
opentelemetry-api==1.25.0
opentelemetry-exporter-otlp-proto-common==1.25.0
opentelemetry-exporter-otlp-proto-http==1.25.0
opentelemetry-instrumentation==0.46b0
opentelemetry-instrumentation-requests==0.46b0
opentelemetry-proto==1.25.0
opentelemetry-sdk==1.25.0
opentelemetry-semantic-conventions==0.46b0
opentelemetry-util-http==0.46b0
packaging==24.0
peewee==3.17.5
protobuf==4.25.4
Pygments==2.18.0
PyYAML==6.0.2
referencing==0.35.1
requests==2.32.2
rich==13.5.3
rpds-py==0.18.1
ruamel.yaml==0.17.40
ruamel.yaml.clib==0.2.8
semgrep==1.94.0
tomli==2.0.1
typing_extensions==4.12.0
urllib3==2.2.1
wcmatch==8.5.2
wrapt==1.16.0
zipp==3.19.2
I mean, it looks like really related to the Semgrep version, but I cannot imagine that no one else noticed issues with JSON output or tests so far, so I have some suspicious that something on my machine influences the results.
Ehh, I'm almost sure it's something with my environment. It works in a clean Docker container. I'm leaving it open because you may have some idea what's going on, but given that it works in a container even when installing all dependency packages in versions as listed above, my environment has to be totally broken.
I just ran into a similar issue:
RPC response indicated an error: Error parsing RPC request:
Atdgen_runtime.Oj_run.Error("Line 1:\nUnsupported variant \"CallGetTargets\"")
Raised at Atdgen_runtime__Oj_run.error_with_line in file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from RPC.handle_single_request in file "OSS/src/rpc/RPC.ml", line 96, characters 13-67
Failed to obtain target files from semgrep-core
However, I never got a json.decoder.JSONDecodeError. The semgrep command just silently failed to produce results even in files with known findings (e.g. rule tests). I suspect it has something to do with the Semgrep Pro semgrep-core binary. Could be a version mismatch between what semgrep is expecting and what the core binary is producing 🤷.
Anyway, I was able to fix it by removing semgrep-core-proprietary. The weird part is I wasn't using an Pro features, so I'm not sure why that binary would be invoked, and causing issues.