cppcheck icon indicating copy to clipboard operation
cppcheck copied to clipboard

JSON schema validation proposal

Open mvds00 opened this issue 1 year ago • 8 comments

add test/cli/test-json.py: Schema validation for JSON files.

test-json.py uses jsonschema (https://pypi.org/project/jsonschema/) to validate JSON files against schemas, optionally with custom format checkers.

All JSON files in the project must be accounted for in test-json.schema_mapping(), either as a schema, or as a file to be validated against a schema. The JSON schemas are implicitly validated against the most recent metaschema available in jsonschema.

This commit adds schemas for JSON addon files and for namingng config files.

TODO:

  • ~~runtime schema validation~~ done
  • ~~remove current ad-hoc validation~~ done
  • generate documentation from schemas

Feedback is more than welcome.

mvds00 avatar Jan 06 '24 15:01 mvds00

I don't know. I am not against validating json in theory.. and you seem to have found a reasonable library. the schema is some standard format I assume?

Yes, this seems to be a (de facto) standard, see https://json-schema.org/specification

the output messages are more convoluted as far as I see.

They are as it stands now, but that is not a function of having schema validation. Rather it is a function of the validator.

on the other hand we do need to improve our json validation..

I think so too. The project will be more robust as all embedded JSON is validated. And after validating, the code can safely assume all is according to spec, reducing the amount of checking and error handling/reporting code.

mvds00 avatar Jan 08 '24 23:01 mvds00

Just some comment via looking at the implementation yet.

The validation is definitely fine in any way of the static configs.

In terms of the addon output we need to consider a potential performance impact compared to manual validation (which does not yet fully exist). It will surely be a drop into the ocean compared to what valueflow is doing but I still would like to have the fastest path possible for the actual core performance.

But maybe in that case it should be (also) implemented within the addon so it will make sure that it will only output valid data. Some addons also output debug messages we need to suppress so maybe the whole output should be piped through a layer and disallowing all print and logging (well - we could leverage that framework at some point) usage.

I won't dig too deep into it yet as the executor and language stuff as well as all other local changes are ahead of it on my list ... still.

firewave avatar Jan 09 '24 08:01 firewave

In terms of the addon output we need to consider a potential performance impact compared to manual validation (which does not yet fully exist). It will surely be a drop into the ocean compared to what valueflow is doing but I still would like to have the fastest path possible for the actual core performance.

For this reason I didn't address that yet. To me a first logical step would seem to make the addon output one big JSON, not a set of JSON objects on separate lines.

But maybe in that case it should be (also) implemented within the addon so it will make sure that it will only output valid data. Some addons also output debug messages we need to suppress so maybe the whole output should be piped through a layer and disallowing all print and logging (well - we could leverage that framework at some point) usage.

That would be less efficient as Python is way slower than C++ in almost every respect (unless we find a Cythonized JSON validator).

Maybe have one JSON object on stdout, and warnings/errors on stderr? This will require a rewrite of the code that executes processes, as the popen() call used in CppCheckExecutor::executeCommand() only captures stdout. This would be an improvement in itself.

I won't dig too deep into it yet as the executor and language stuff as well as all other local changes are ahead of it on my list ... still.

I would like to keep the interface between addon and cppcheck as-is for now, and first finish what's on the table now. A few updates to this PR are underway...

mvds00 avatar Jan 09 '24 14:01 mvds00

For this reason I didn't address that yet. To me a first logical step would seem to make the addon output one big JSON, not a set of JSON objects on separate lines.

As source files might be several megabytes that doesn't seems like a good idea as that would balloon the memory usage. Especially since we already have several caches which store these (still working on getting rid of those).

That would be less efficient as Python is way slower than C++ in almost every respect (unless we find a Cythonized JSON validator).

We are not using a very fast JSON library on the C++ side either. Also vanilla Python is making major strides in reclaiming the performance lost since 2.7.

That actually brings up an interesting point. We could pre-compile the addons for better performance.

I would like to keep the interface between addon and cppcheck as-is for now, and first finish what's on the table now. A few updates to this PR are underway...

Same. If we change it we need to provide backwards compatibility as people might be running their own addons. We should probably just introduce a new interface if we plan any changes.

firewave avatar Jan 09 '24 15:01 firewave

I would like to keep the interface between addon and cppcheck as-is for now, and first finish what's on the table now. A few updates to this PR are underway...

Same. If we change it we need to provide backwards compatibility as people might be running their own addons. We should probably just introduce a new interface if we plan any changes.

newline-separated-json-objects is actually an (informal) format, called JSONL. Sounds like something that is easy to formalize/verify/unittest in the interface between addons and cppcheck. Debugging information could still be emitted between objects without violating , e.g. as a quoted string or as an object with a defined structure.

mvds00 avatar Jan 11 '24 14:01 mvds00

heads up: #5870 will change the naming of the pytest-based test files

I still haven't looked at this. I currently am in the midst of finished up something else as will take a look afterwards.

firewave avatar Jan 12 '24 10:01 firewave

heads up: #5870 will change the naming of the pytest-based test files

I still haven't looked at this. I currently am in the midst of finished up something else as will take a look afterwards.

Thanks. There are some additional conflicts to solve, plus some additional improvements that I will push in the coming days.

mvds00 avatar Jan 16 '24 09:01 mvds00

Sorry for the late reply.

I hope to take a proper look this week. What about the additional changes you wanted to push?

firewave avatar May 27 '24 11:05 firewave