cl-yaml
cl-yaml copied to clipboard
Incorrect parsing of strings as integers
Not sure if this is a problem in cl-yaml or cl-libyaml (or with me), but if I have the following yaml:
- key: afsdf230fsfa
secret: '22352364623'
secret
gets parsed into a integer. It should get parsed into a string, since it's within single quotes. Neither secret: "22352364623"
nor secret: !!str '22352364623'
parse into a string either. I'm not entirely sure what the YAML spec says on the 'correct' way to do this - thoughts?
I believe I've run into this problem before. This is almost certainly a problem with the scalar regular expressions.
Ok, some feedback, putting this through cl-libyaml shows the single quotes are stripped:
YAML-EXAMPLE> (defpackage yaml-example
(:use :cl)
(:import-from :libyaml.macros
:with-parser
:with-event)
(:import-from :libyaml.event
:event-type))
(in-package :yaml-example)
(defun parse (string)
(with-parser (parser string)
(with-event (event)
(loop do
(when (libyaml.parser:parse parser event)
(let ((type (event-type event)))
(print type)
(when (eql type :scalar-event)
(print (libyaml.event:event-scalar-data event)))
(when (eql type :stream-end-event)
(return-from parse nil))))))))
STYLE-WARNING: redefining YAML-EXAMPLE::PARSE in DEFUN
PARSE
YAML-EXAMPLE> (parse "'22352364623'")
:STREAM-START-EVENT
:DOCUMENT-START-EVENT
:SCALAR-EVENT
(:ANCHOR NIL :TAG NIL :VALUE "22352364623")
:DOCUMENT-END-EVENT
:STREAM-END-EVENT
NIL
YAML-EXAMPLE>
I believe the parser should look at the scalar data style attribute. Consider this redefinition of libyaml.event:event-scalar-data
(defun event-scalar-data (event)
(let* ((scalar (scalar-pointer event))
(anchor (foreign-slot-value scalar '(:struct scalar-t) 'anchor))
(tag (foreign-slot-value scalar '(:struct scalar-t) 'tag))
(value (foreign-slot-value scalar '(:struct scalar-t) 'value))
(style (foreign-slot-value scalar '(:struct scalar-t) 'style)))
(list :anchor anchor
:tag tag
:value value
:style style)))
and the following examples (using your parse function from above)
YAML-EXAMPLE> (parse "22352364623")
:STREAM-START-EVENT
:DOCUMENT-START-EVENT
:SCALAR-EVENT
(:ANCHOR NIL :TAG NIL :VALUE "22352364623" :STYLE :PLAIN-SCALAR-STYLE)
:DOCUMENT-END-EVENT
:STREAM-END-EVENT
NIL
YAML-EXAMPLE> (parse "'22352364623'")
:STREAM-START-EVENT
:DOCUMENT-START-EVENT
:SCALAR-EVENT
(:ANCHOR NIL :TAG NIL :VALUE "22352364623" :STYLE :SINGLE-QUOTED-SCALAR-STYLE)
:DOCUMENT-END-EVENT
:STREAM-END-EVENT
NIL
Notice that when the scalar value is single quoted, the style will tell you so. I think that is how the parser will need to distinguish between numbers and text that looks like numbers.
While I provided a modified event-scalar-data from cl-libyaml to illustrate the point, I thought I would report my observation here first, before suggesting an issue be opened up in the cl-libyaml project.
I've been reading the yaml spec, and now I think there is no bug here.
YAML uses tags to specify data type [0]. If no tag is provided then (assuming the core schema) the tag is resolved according to pattern matching rules [1]. Single or double quoting are considered part of a node's style. Node style is only a presentation detail, and should not change the way a tag is resolved [2].
[0] http://www.yaml.org/spec/1.2/spec.html#native%20data%20structure// [1] http://www.yaml.org/spec/1.2/spec.html#id2805071 [2] http://www.yaml.org/spec/1.2/spec.html#id2768019 "Note that resolution must not ..."
In light of that, quoting as I suggested above should not result in the value being interpreted as a string. Rather, one method would be to tag the data like this:
- key: afsdf230fsfa
secret: !!str 22352364623
Now, if you do that, cl-yaml still won't parse it correctly. The parser appears to rely entirely on default tag resolution and ignores tags when they are present. Making use of tag information is a feature that could be added. It would be great if that mechanism could be extended for user-specified data types.
Edit: The original comment did say the markup with the !!str tag did not work. I got side tracked down the quoting train of thought. But I still think the issue here is a larger one related to making use of tag information.
I left a question in #yaml last night to confirm my understanding. I was pointed to the bottom of Section 9.6.1 http://www.yaml.org/spec/1.2/spec.html#id2784064:
See example 6.28 and the text right before it. "If a node has no tag property, it is assigned a non-specific tag that needs to be resolved to a specific one. This non-specific tag is “!” for non-plain scalars and “?” for all other nodes. This is the only case where the node style has any effect on the content information." (Emphasis mine)
They give the examples: "12" => str 12 => int
Gotta watch out for those exceptions to the rules!