libyang icon indicating copy to clipboard operation
libyang copied to clipboard

YANG patterns (regular expressions) not fully implemented

Open cabo opened this issue 3 years ago • 15 comments

src/schema_compile_node.c does not seem to implement YANG patterns fully.

E.g., \d and \w do not have the semantics given to them in Section 9.4.5 of RFC 7950.

Repeat:

Go to: https://yangcatalog.org/yangre

Check both YANG validate and W3C validate checkboxes

Try:

pattern test
\d\d\d\d ২০২২
\w ä

I do not understand why yangcatalog even offers two different validations, as they are defined to be the same.

cabo avatar Jul 08 '22 15:07 cabo

It seems almost none of these short character classes are supported in the way XML Schema defines them as their meaning differs from Perl regexes (which is what we mostly support by using PCRE2). That is quite unfortunate because I do not really see a simple way of improving the situation.

The only library that should support these regexes is libxml2, which our previous library used to depend on and we do not want to repeat that mistake. But if we are to keep using PCRE2, there would have to be more manual pattern preprocessing involved to transform these patterns into something PCRE2 understands. Despite not liking this, it seems like the best solution so unless you have a better idea, it will be implemented at some point once we have some spare time.

michalvasko avatar Jul 15 '22 09:07 michalvasko

I don't think YANG people actually want to use \d or \w in XSD semantics. So the recommendation should be to replace these by [0-9] and [...a-zA-Z0-9]. Note that the iregexp draft simply excludes \d and \w, as it is hard to get interoperability with these.

cabo avatar Jul 15 '22 09:07 cabo

So the recommendation should be to replace these by [0-9] and [...a-zA-Z0-9].

That is exactly the Perl semantics of these. So what exactly is the problem and what would you expect the solution to be?

michalvasko avatar Jul 15 '22 09:07 michalvasko

YANG defines patterns differently from what you implement. So instead of saying that YANG and W3C have different results (they can't, as one is defined in terms of the other), you need to say that the regexp isn't interoperable.

cabo avatar Jul 15 '22 09:07 cabo

YANG defines patterns differently from what you implement.

Well, I would hope it all works except for these character classes.

So instead of saying that YANG and W3C have different results (they can't, as one is defined in terms of the other), you need to say that the regexp isn't interoperable.

Are you referring to the YANG catalog? We do not maintain it, they are just using our yangre for testing the YANG patterns.

michalvasko avatar Jul 15 '22 09:07 michalvasko

What I'm trying to say is that you can't silently compile \d into [0-9]. Because of existing practice, you can't simply fail \d, but you should warn.

cabo avatar Jul 15 '22 09:07 cabo

No problem, warning added.

michalvasko avatar Jul 15 '22 12:07 michalvasko

It turns out that ietf-yang-library is using \d as [0-9], which it seems is wrong but based on that I decided to change the warning into a verbose message.

michalvasko avatar Jul 15 '22 13:07 michalvasko

That's a vicious cycle: Bugs like ietf-yang-library happen because the tools don't report the bugs.

cabo avatar Jul 15 '22 16:07 cabo

Too late to change anything about that, anyway, the best we could hope for is an errata. Before you mentioned it, I, for some reason, assumed that \d works the way it does in Perl selecting only ASCII numbers. Seems the situation is the same with the authors of ietf-yang-library.

michalvasko avatar Jul 18 '22 11:07 michalvasko

Exactly, and that common misunderstanding is a reason why a warning is needed.

cabo avatar Jul 18 '22 11:07 cabo

Yes, ideally. In practice it would lead to most people turning off warnings altogether (making things worse) and us getting spammed by others asking about the warning and how to fix it. I have no reason to want either of that.

michalvasko avatar Jul 18 '22 11:07 michalvasko

I can understand why you do not want to be spearheading cleaning up that part of the YANG ecosystem. It still would be useful to have a way to get warnings for people who write new YANG modules.

cabo avatar Jul 18 '22 12:07 cabo

Only now have I found out that PCRE2 has a flag for Unicode support of \d, \w and so on, so I have just used it.

michalvasko avatar Jul 22 '22 07:07 michalvasko

Great! Looking forward to test this.

cabo avatar Jul 26 '22 09:07 cabo