rule-engine icon indicating copy to clipboard operation
rule-engine copied to clipboard

Complex object with dicts and arrays

Open devBioS opened this issue 2 years ago • 8 comments

Came about this great library while searching for python rule engines, this project would fit my needs completly if I could traverse arrays with the dot-syntax (like issue #30 but more deep and nested). This is because this kind of syntax would be easy enough to give other people the chance to write some rules for specific actions without having to know python.

I have dicts with array of dicts that have varying content and I need to evaluate if a specific path is present, ignoring the positin in the arrays:

{"event": { "title": "1 computer made a problem", "startDate": "20220502", "endDate": "20220502", "created": 1651528631972, "creatorId": None, "internaldata": [ { "type": "USER", "details": { "firstName": "first1", "lastName": "last2" } }, { "type": "COMPUTER", "details": { "fqdn": "computer1.domain.net", "lansite": "Munich" } }
], "items": [ { "type": "EVENT", "computerinfo": { "resources": [ {"userassigments": "data1"}, {"companyassigned": "Yes"}, {"otherdata": "data2"} ] } ] }

I could do that with your library now: Rule('event.title =~ ".*made a problem$" and event.items[0].computerinfo.resources[3].companyassigned == "Yes"')

Because the data is not always the same and the position of the dicts within the arrays change, I would need to somehow traverse the arrays within the dicts to check if specific data is present (dict keys are always named the same), e.g.:

Rule('event.title =~ ".*made a problem$" and event.items[*].computerinfo.resources[*].companyassigned == "Yes"')

Is that possible somehow or could be added to the library?

devBioS avatar May 04 '22 14:05 devBioS

Yeah you'd want to use a rule like this: event.title and [item for item in event.items if item&['computerinfo'] and [resource for resource in item['computerinfo']['resources'] if resource&['companyassigned'] == 'Yes']]

That leverages nested comprehension along with the safe navigation operator to avoid key lookup errors. Arrays that are not empty eval to True just like they do in Python. Alternatively, you could be more explicit by checking the length of the array by using the length or is_empty attribute.

That would work for your use-case as is. FWIW though I'd recommend doing some normalization on the data to make it easier to write rules. Specifially if items.computerinfo.resources was a dictionary instead of an array because the keys were unique, it'd be easier to write rules.

Also you can use the debug_repl module to experiment with this.

(rule-engine)   : rule-engine: 11:02:03 rule-engine cat repl_setup.py
# issue #38
thing = {
    "event": {
        "title": "1 computer made a problem",
        "startDate": "20220502",
        "endDate": "20220502",
        "created": 1651528631972,
        "creatorId": None,
        "internaldata": [
            { "type": "USER", "details": { "firstName": "first1", "lastName": "last2" } },
            { "type": "COMPUTER", "details": { "fqdn": "computer1.domain.net", "lansite": "Munich" } }
        ],
        "items": [
            {
                "type": "EVENT",
                "computerinfo": {
                    "resources": [ {"userassigments": "data1"}, {"companyassigned": "Yes"}, {"otherdata": "data2"} ]
                }
            }
        ]
    }
}
(rule-engine)   : rule-engine: 11:02:05 rule-engine PYTHONPATH=$(pwd)/lib python -m rule_engine.debug_repl --edit-file repl_setup.py --debug
executing: repl_setup.py
rule > event.title and [item for item in event.items if item&['computerinfo'] and [resource for resource in item['computerinfo']['resources'] if resource&['companyassigned'] == 'Yes']]
result: 
True
rule >

zeroSteiner avatar May 04 '22 15:05 zeroSteiner

Thanks a lot for the explanation! I did some tests and it looks like it work for this case, I didn't get it that far :)

I just cannot get the people who should write such rules to understand the syntax like this :D

I already tried to normalize the data ahead of this issue to get rid of the arrays but in most cases I have the same keys with only little difference in values that would overwrite itself during normalization.

In my real scenario I have about 8-10 levels deep dicts of array of dicts - I think writing rules for this would be too complex for my users. It would be easier and better readable if they just could set some placeholder like * or # into the brackets to say "any" for arrays.

Nevertheless thank you very much!

devBioS avatar May 04 '22 18:05 devBioS

Let me think about it. I'll admit I like the syntax you're proposing. I think I could make it backwards compatible and relatively intuitive if I used # instead of *.

zeroSteiner avatar May 04 '22 18:05 zeroSteiner

That would be the burner and it would make this library the only one I'm aware of that can traverse arrays and evaluate later dicts.

If a user could create a rule like this:

Rule('event.title =~ ".*made a problem$" and event.items[#].computerinfo.resources[#].companyassigned == "Yes" and event.internaldata[#].details.lansite == "Munich" ')

That would be very intuitive, easy to read and some non-programmers could use it easily as it is leaned towards directory traversal syntax like

ls /mnt/*/asdf/*/test

For arrays that have no such keys you could still apply the context default variable, so if it is set to None it will ignore dicts following an array that don't have the requested keys. If the context is not set it will produce an exception where some can react.

Would be somewhat cool to see this in your library, I tried myself to find some startingpoints with this, but my debugging environment seems not work correctly while the ast tree is built and operators are selected, maybe some kind of threading problem that won't give me the full callstack..

Anyway, if I could help let me know :D

devBioS avatar May 04 '22 20:05 devBioS

@zeroSteiner This library is impressive thanks for maintaining it Does it support builtin math func called on iterable i.e:

 data = {"a": {"b": [{"count": 19}, {"count": 18}]}}
r = rule_engine.Rule('sum([v.count for v in a.b])' == 37)

vatsramkesh avatar Jun 13 '23 11:06 vatsramkesh

@vatsramkesh No, see #58 which is a duplicate of #32.

zeroSteiner avatar Jun 13 '23 12:06 zeroSteiner

Came here also looking for array[*].

In my case, I'm looking to work some kubernetes and other objects. In a contrived example, I'd like to know if any of the elements in the metadata.owner_references list have a member kind == 'ReplicaSet'

{'metadata': {
    'owner_references': [{'api_version': 'apps/v1',
                       'block_owner_deletion': True,
                       'controller': True,
                       'kind': 'ReplicaSet',
                       'name': 'argo-rollouts'}],
}}

In this case the kubernetes API returns objects, so I've loaded the attribute_resolver, works quite well.

>>> ctx = rule_engine.Context(resolver=rule_engine.resolve_attribute)
>>> r = Rule("metadata.owner_references[0].kind == 'ReplicaSet'", context = ctx)
>>> r.matches(p)
True

However there may be no, or multiple items in this array, so tracing them all would valuable. Ideally something like

r = Rule("metadata.owner_references[*].kind == 'ReplicaSet'", context = ctx)

In rego, there is a whole thing with it supporting walking all members of any enum, and would result in a similar rule

metadata.owner_references[_].kind = 'ReplicaSet'

xarses avatar Jun 26 '23 05:06 xarses