intelmq Validation and sanitation of bot parameters (typing)

Since #1729 (version 3.0.0) all bots have built-in default values and typing information for all parameters.

The default values are in use by the bots if the configuration does not contain that parameter, and by the API/Manager to show as initial value.

The Manager parses and displays all parameters' types correctly, but the culprit is the saving. Especially problematic are parameters left empty (resulting in an empty string) while the bots expect, for example, a boolean value. Even more dangerous are for example entered values of true, which results in the string "true", not the boolean True; and false resulting in "false", which evaluates to the boolean true value!

This is a big usability problem for IntelMQ users and leads not only to confusion but also to mis-routed data.

There are two independet steps that IntelMQ can make to improve the sitatution:

Make the Manager type-aware. Show the accepted type to the user, and only allow the input of allowed values. E.g. for boolean values show a radio box/drop down with true and false. And also send the value in the correct type to the backend (API).
Validate and sanitize the types in the bots. For example, for boolean parameters sanitize "false" to false, for intergers convert "10" to 10, for lists transform "foo,bar" to ["foo", "bar"].

What do you think of this topic? Is someone willing and able to implement this or support work on this topic?

Related: https://github.com/certtools/intelmq-manager/issues/294 https://github.com/certtools/intelmq-manager/issues/81 Issues/PRs that could have been prevented: #2536 #2481 #2075 #2495 (and many more unreported ones)

Nov 29 '24 20:11 sebix

I was thinking a few times about that. Actually, I think we should go in the direction of descriptors, Pydantic or other forms of automated validation. I'd think the most about mix of descriptors for complex cases and automated type-based validation, but it's just an idea.

The Manager should know more about the data - maybe e.g. using Pydantic would let us generate a JSON Schema for configuration, that manager could get? I also think the Manager should not need to dump the whole configuration, but we should have an API exposing per-bot config (with a proper validation).

Dec 02 '24 12:12 kamil-certat

The easiest variant, but also with the least dependencies and no changes in the bots, would be to get the types with typing.get_type_hints(bot_instance) when loading the parameters in intelmq.lib.bot.Bot.__load_configuration and doing the conversion there.

Apr 24 '25 06:04 sebix

I'd be very careful with that:

the typing is not always entirely correct - we don't want to surprisingly break something,
we could not check "required" parameters as the parameters and bot attributes are mixed,
not a problem, but good for the future - this will not give us a possibility for more complex checks (is path valid? are parameters in range? are report types valid?)

Apr 28 '25 08:04 kamil-certat

the typing is not always entirely correct - we don't want to surprisingly break something,

That would be a bug of the type declaration, not a bug of the type sanitation

we could not check "required" parameters as the parameters and bot attributes are mixed,

All bot attributes not starting with an underscore are parameters.

Apr 28 '25 08:04 sebix