insights-core icon indicating copy to clipboard operation
insights-core copied to clipboard

Enhance that there are two or more values of the same key in a dictio…

Open shlao opened this issue 4 years ago • 12 comments

Updated

insights/parsers/__init__.py

Here are the codes of parser SystemctlShowServiceAll:

 19 class SystemctlShowServiceAll(CommandParser, dict):
 79     def parse_content(self, content): 
 90         for s, e in idx_list:
 91             data = split_kv_pairs(content[s:e], use_partition=False)
 92             name = data.get('Names', data.get('Id'))
 93             if not name:
 94                 raise ParseException('"Names" or "Id" not found!')
 95             self[name] = dict((k, v) for k, v in data.items() if v)

Here are the testing data (on RHEL8):

# systemctl show sshd.service  > /tmp/sshd.txt
# cat /tmp/sshd.txt | grep Env
EnvironmentFiles=/etc/crypto-policies/back-ends/opensshserver.config (ignore_errors=yes)
EnvironmentFiles=/etc/sysconfig/sshd (ignore_errors=yes)

the function split_kv_pairs will overwrite the dictionary value of the same key EnvironmentFiles and only save the last value.

Signed-off-by: shlao [email protected]

shlao avatar Aug 19 '20 07:08 shlao

If the key has existed then it is well to combine two dictionary values. for example, what we get from expression service_conf['sshd.service']['EnvironmentFiles'] is: ['/etc/crypto-policies/back-ends/opensshserver.config', '/etc/sysconfig/sshd']

shlao avatar Aug 19 '20 07:08 shlao

@xiangce @bfahr Will you help me to review this?

shlao avatar Aug 20 '20 02:08 shlao

@bfahr - This change looks good to me, please help double confirm.

xiangce avatar Aug 27 '20 06:08 xiangce

@bfahr Will you help me to review this PR, thanks!

shlao avatar Sep 04 '20 03:09 shlao

The proposed impl would require users to test every value to see whether it's a list or not. I prefer to keep the existing impl and handle the duplicate key use case separately, either in a different function or with a keyword argument to the current one. e.g. if allow_duplicates is true, all values should be lists for consistency. Otherwise, go with the current behavior where the last duplicate key wins.

csams avatar Sep 09 '20 22:09 csams

The proposed impl would require users to test every value to see whether it's a list or not. I prefer to keep the existing impl and handle the duplicate key use case separately, either in a different function or with a keyword argument to the current one. e.g. if allow_duplicates is true, all values should be lists for consistency. Otherwise, go with the current behavior where the last duplicate key wins.

I like the allow_duplicates idea. Thank you Chris. @shlao how do you think? Would you please update per Chris's proposal?

xiangce avatar Sep 10 '20 00:09 xiangce

@csams Good idea. @xiangce Will add a parameter allow_duplicates to make the parser backward-compatible.

shlao avatar Sep 10 '20 01:09 shlao

@csams if all values should be lists for consistency:

        >>> split_kv_pairs(lines, allow_duplicates=True)
        {'keyword2': ['value2a=True, value2b=100M'], 'keyword1': ['value1', 'value2']}

The the we need to change a lot in parser SystemctlShowServiceAll:

 91             data = split_kv_pairs(content[s:e], use_partition=False, allow_duplicates=True)
 92             name = data.get('Names', data.get('Id'))
 95             self[name] = dict((k, v) for k, v in data.items() if v)

Line 95 will hit this error TypeError: unhashable type: 'list'

shlao avatar Sep 10 '20 10:09 shlao

@csams So I change it to store values in a list when there are two or more lines that have the same key

 _v = kv_pairs[k]
 _v = [_v] if not isinstance(_v, list) else _v

shlao avatar Sep 10 '20 11:09 shlao

@csams if all values should be lists for consistency:

        >>> split_kv_pairs(lines, allow_duplicates=True)
        {'keyword2': ['value2a=True, value2b=100M'], 'keyword1': ['value1', 'value2']}

The the we need to change a lot in parser SystemctlShowServiceAll:

 91             data = split_kv_pairs(content[s:e], use_partition=False, allow_duplicates=True)
 92             name = data.get('Names', data.get('Id'))
 95             self[name] = dict((k, v) for k, v in data.items() if v)

Line 95 will hit this error TypeError: unhashable type: 'list'

@shlao - This is because the name is also wrapped into a list, it can be resolved by:

            # There must be one and only one "name" as the key
            if not name or len(name) > 1:
                raise ParseException('"Names" or "Id" not found!')
            name = name[0]
            self[name] = dict((k, v) for k, v in data.items() if v)

As Chris mentioned, if only store the keys with multiple values into a list, it would require users to test every value to see whether it's a list or not. I don't think this is what we want.

xiangce avatar Sep 11 '20 05:09 xiangce

@xiangce @csams You can see what are changes in the latest pr. Beside that, we need to change the rules using the Parsers which invoke function split_kv_pairs:

postgresql_on_external_disk
httpd_fails_to_start_during_boot
sssd_network_user_auth_issue

shlao avatar Sep 11 '20 07:09 shlao

@shlao - Yeap, that's the impact of this update (applying the "allow_duplicates=Ture" to this parser). It's reasonable for me to make this parser more flexible to support these duplicated keys (like a bugfix). And we'd better use this parser in this way (I think we should update the existing rules). @bfahr - How do you think?

xiangce avatar Sep 11 '20 08:09 xiangce

Can one of the admins verify this patch?

candlepin-bot avatar Apr 26 '24 15:04 candlepin-bot

I close this one because there is no requirements now.

shlao avatar Apr 27 '24 08:04 shlao