repology-updater icon indicating copy to clipboard operation
repology-updater copied to clipboard

Fixing cpe data split for cpe fields containing colon

Open Bluebottle-new opened this issue 3 years ago • 8 comments

The fields for projects whose vendor and/or project fields contain a colon, like e.g.: perl:archive-tar or perl:convert-asn1, are not correctly split from the given URI binding.

Currently fields with escaped colons like archive::tar_project are also split at the colon. With this pull request, the escaped colons are ignored and the URI binding is only split at unescaped colons.

Bluebottle-new avatar Apr 12 '21 21:04 Bluebottle-new

This doesn't seem to handle escaped backslash correctly:

>>> import re
>>> re.split(r"(?<!\\):", 'foo:bar')
['foo', 'bar']
>>> re.split(r"(?<!\\):", 'foo\\:bar')
['foo\\:bar']
>>> re.split(r"(?<!\\):", 'foo\\\\:bar')
['foo\\\\:bar']  # expected ['foo\\\\', 'bar']

We already have proper CPE splitting code in vulnupdater: https://github.com/repology/repology-vulnupdater/blob/8d8174768b97fc9ca1226a353ac5164b10dce050/vulnupdater/cpe.py#L36-L52 IMO it would be better to add it here as a plain function in repology/parsers/cpe.py and use in the parsing.

AMDmi3 avatar Apr 12 '21 22:04 AMDmi3

IMO it would be better to add it here as a plain function in repology/parsers/cpe.py and use in the parsing.

Fancy doing this? Also should be nice to have an example of misparsing to make sure it's fixed afterwards. I guess the issue manifests as some incorrect problem?

AMDmi3 avatar Apr 13 '21 08:04 AMDmi3

You are right, my change fixes the escaped colon but ignores the escaped backslash, and using a function like mentioned above would be more suitable.

I might be inclined to try implementing that function, although it might require some further changes in the gentoo parser, as the program uses fixed fields for the cpe information:

pkg.add_cpe(cpe[2], cpe[3])

which works for the currently used CPE2.2 format but would fail for the CPE2.3 format.

It would probably be a good idea to extend the parser to support both CPE URI formats. Although, going forward, I would also include support for the WFN format, which makes field assignments much easier and would allow the definition of cpe information in three ways:

cpe:/a:microsoft:internet_explorer:8.0.6001:beta
cpe:2.3:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
wfn:[part="o",vendor="microsoft",product="windows_vista",version="6\.0", update="sp1",edition=NA,language=NA,sw_edition="home_premium", target_sw=NA,target_hw="x64",other=NA]

Bluebottle-new avatar Apr 13 '21 09:04 Bluebottle-new

It depends on which format Gentoo (and other repos) currently uses. As far as I remember, we only need CPE string parsing for Gentoo (other repos with CPE info available provide it as separate fields), and Gentoo only uses 2.2 format. 2.3 format can be supported with the same splitting code with just an extra condition which determines the format by the starting components.

I would not over-engineer by adding support for wfn which is not used anywhere, but if we need it at some point, it would make sense to switch to third party module such as cpe or cpe_utils.

AMDmi3 avatar Apr 13 '21 11:04 AMDmi3

I can't speak for Gentoo, but at Liguros we currently provide the cpe information in the 2.2 format, which is the format we picked up from Gentoo. Switching over to 2.3 format would be a matter of a few minutes, as the information is generated automatically in our update pipeline. Thus we don't mind if we deliver the information in 2.2 or 2.3 format; even changing it to a more suitable format would not be out of the question.

For now I would try to change the gentoo parser, adding the splitting code, with the possibility to parse 2.2 or 2.3 format.

Bluebottle-new avatar Apr 13 '21 12:04 Bluebottle-new

OK, it seems that I finally satisfied the checks :)

I have added cpe.py with the parse function. gentoo.py is now using that function and should now support cpe information in 2.2 and 2.3 format.

A test file with some test cases was added as well.

Bluebottle-new avatar Apr 13 '21 18:04 Bluebottle-new

OK, with my little knowledge of python, I think I was able to add the parse as part of a data class.

Seems to work for the parser, although I am not so sure about the test cases.

Bluebottle-new avatar Apr 16 '21 21:04 Bluebottle-new

Hmm, the check currently fails in dependencies with:

 Unable to locate package postgresql-server-dev-13

Could you have a look at it?

Bluebottle-new avatar Apr 17 '21 07:04 Bluebottle-new