cl-yaml icon indicating copy to clipboard operation
cl-yaml copied to clipboard

Incorrect parsing of strings as integers

Open CodyReichert opened this issue 9 years ago • 5 comments

Not sure if this is a problem in cl-yaml or cl-libyaml (or with me), but if I have the following yaml:

- key: afsdf230fsfa
  secret: '22352364623'

secret gets parsed into a integer. It should get parsed into a string, since it's within single quotes. Neither secret: "22352364623" nor secret: !!str '22352364623' parse into a string either. I'm not entirely sure what the YAML spec says on the 'correct' way to do this - thoughts?

CodyReichert avatar Oct 05 '15 22:10 CodyReichert

I believe I've run into this problem before. This is almost certainly a problem with the scalar regular expressions.

eudoxia0 avatar Oct 05 '15 22:10 eudoxia0

Ok, some feedback, putting this through cl-libyaml shows the single quotes are stripped:

YAML-EXAMPLE> (defpackage yaml-example
  (:use :cl)
  (:import-from :libyaml.macros
                :with-parser
                :with-event)
  (:import-from :libyaml.event
                :event-type))
(in-package :yaml-example)

(defun parse (string)
  (with-parser (parser string)
    (with-event (event)
      (loop do
        (when (libyaml.parser:parse parser event)
          (let ((type (event-type event)))
            (print type)
            (when (eql type :scalar-event)
              (print (libyaml.event:event-scalar-data event)))
            (when (eql type :stream-end-event)
              (return-from parse nil))))))))
STYLE-WARNING: redefining YAML-EXAMPLE::PARSE in DEFUN
PARSE
YAML-EXAMPLE> (parse "'22352364623'")

:STREAM-START-EVENT 
:DOCUMENT-START-EVENT 
:SCALAR-EVENT 
(:ANCHOR NIL :TAG NIL :VALUE "22352364623") 
:DOCUMENT-END-EVENT 
:STREAM-END-EVENT 
NIL
YAML-EXAMPLE> 

eudoxia0 avatar Oct 12 '15 16:10 eudoxia0

I believe the parser should look at the scalar data style attribute. Consider this redefinition of libyaml.event:event-scalar-data

(defun event-scalar-data (event)
  (let* ((scalar (scalar-pointer event))
         (anchor (foreign-slot-value scalar '(:struct scalar-t) 'anchor))
         (tag (foreign-slot-value scalar '(:struct scalar-t) 'tag))
         (value (foreign-slot-value scalar '(:struct scalar-t) 'value))
         (style (foreign-slot-value scalar '(:struct scalar-t) 'style)))
    (list :anchor anchor
          :tag tag
          :value value
          :style style)))

and the following examples (using your parse function from above)

YAML-EXAMPLE> (parse "22352364623")

:STREAM-START-EVENT 
:DOCUMENT-START-EVENT 
:SCALAR-EVENT 
(:ANCHOR NIL :TAG NIL :VALUE "22352364623" :STYLE :PLAIN-SCALAR-STYLE) 
:DOCUMENT-END-EVENT 
:STREAM-END-EVENT 
NIL
YAML-EXAMPLE> (parse "'22352364623'")

:STREAM-START-EVENT 
:DOCUMENT-START-EVENT 
:SCALAR-EVENT 
(:ANCHOR NIL :TAG NIL :VALUE "22352364623" :STYLE :SINGLE-QUOTED-SCALAR-STYLE) 
:DOCUMENT-END-EVENT 
:STREAM-END-EVENT 
NIL

Notice that when the scalar value is single quoted, the style will tell you so. I think that is how the parser will need to distinguish between numbers and text that looks like numbers.

While I provided a modified event-scalar-data from cl-libyaml to illustrate the point, I thought I would report my observation here first, before suggesting an issue be opened up in the cl-libyaml project.

jasonmelbye avatar Jul 08 '16 04:07 jasonmelbye

I've been reading the yaml spec, and now I think there is no bug here.

YAML uses tags to specify data type [0]. If no tag is provided then (assuming the core schema) the tag is resolved according to pattern matching rules [1]. Single or double quoting are considered part of a node's style. Node style is only a presentation detail, and should not change the way a tag is resolved [2].

[0] http://www.yaml.org/spec/1.2/spec.html#native%20data%20structure// [1] http://www.yaml.org/spec/1.2/spec.html#id2805071 [2] http://www.yaml.org/spec/1.2/spec.html#id2768019 "Note that resolution must not ..."

In light of that, quoting as I suggested above should not result in the value being interpreted as a string. Rather, one method would be to tag the data like this:

- key: afsdf230fsfa
  secret: !!str 22352364623

Now, if you do that, cl-yaml still won't parse it correctly. The parser appears to rely entirely on default tag resolution and ignores tags when they are present. Making use of tag information is a feature that could be added. It would be great if that mechanism could be extended for user-specified data types.

Edit: The original comment did say the markup with the !!str tag did not work. I got side tracked down the quoting train of thought. But I still think the issue here is a larger one related to making use of tag information.

jasonmelbye avatar Jul 13 '16 03:07 jasonmelbye

I left a question in #yaml last night to confirm my understanding. I was pointed to the bottom of Section 9.6.1 http://www.yaml.org/spec/1.2/spec.html#id2784064:

See example 6.28 and the text right before it. "If a node has no tag property, it is assigned a non-specific tag that needs to be resolved to a specific one. This non-specific tag is “!” for non-plain scalars and “?” for all other nodes. This is the only case where the node style has any effect on the content information." (Emphasis mine)

They give the examples: "12" => str 12 => int

Gotta watch out for those exceptions to the rules!

jasonmelbye avatar Jul 13 '16 12:07 jasonmelbye