libyaml icon indicating copy to clipboard operation
libyaml copied to clipboard

%TAG prefix does not accept all characters in ns-uri-char production

Open gkellogg opened this issue 2 years ago • 2 comments

As noted in https://github.com/yaml/yaml-spec/issues/268#issuecomment-1208565027, Psych does not accept a %TAG prefix including a #, which seems to be due to the following code:

https://github.com/yaml/libyaml/blob/f8f760f7387d2cc56a2fc7b1be313a3bf3f7f58c/src/scanner.c#L2603-L2627

According to theYAML 1.2 Spec the ns-uri-char does include #, which is missing from the scanner.

[39] ns-uri-char ::=
    (
      '%'
      [ns-hex-digit](https://yaml.org/spec/1.2.2/#rule-ns-hex-digit){2}
    )
  | [ns-word-char](https://yaml.org/spec/1.2.2/#rule-ns-word-char)
  | '#'
  | ';'
  | '/'
  | '?'
  | ':'
  | '@'
  | '&'
  | '='
  | '+'
  | '$'
  | ','
  | '_'
  | '.'
  | '!'
  | '~'
  | '*'
  | "'"
  | '('
  | ')'
  | '['
  | ']'

This prevents creating a TAG line such as the following:

%TAG ! http://www.w3.org/2001/XMLSchema#

gkellogg avatar Aug 08 '22 21:08 gkellogg

As a workaround, %TAG ! http://www.w3.org/2001/XMLSchema%23 works, but is not ideal, and shouldn't be required based on the grammar.

gkellogg avatar Aug 16 '22 05:08 gkellogg

The scanning issue extends to inline-tags, as well. If you parse the following

%TAG !xsd! http://www.w3.org/2001/XMLSchema%23
---
date: !xsd!date 2022-08-08

and re-serialize without the %TAG directive, you'll get the following:

date: !<http://www.w3.org/2001/XMLSchema%23date> 2022-08-08

Per the grammar, you should also be able to parse the following:

date: !<http://www.w3.org/2001/XMLSchema#date> 2022-08-08

But, it fails in a similar manner to that reported on %TAG. In this case, it is the c-verbatim-tag which includes ns-uri-char+ where the # is again excluded.

Working around this requires a pre-parsing step to replace these characters are appropriate before parsing and after serializing.

This is tested using Ruby Psych version 4.0.4, which wraps libyaml, and the issues seem to be entirely within the library.

gkellogg avatar Aug 17 '22 22:08 gkellogg