rapidyaml icon indicating copy to clipboard operation
rapidyaml copied to clipboard

When parsing, is there a way to know the "real type" of a scalar ?

Open JonathanGirardeau opened this issue 3 years ago • 7 comments

Hello, I have not started to use this library but it seems very interesting. I would like to know if it is possible to check the "real type" of a scalar. For example in this YAML :

hello: 1234
world: "1234"

When I parse it, what API of the library can I call to find out the value of hello field is a number and the value of world field is a string ?

JonathanGirardeau avatar Sep 28 '22 17:09 JonathanGirardeau

The "real type" of a scalar is a question for which the answer must depend on the application. There is no final answer that is context-free. Eg, is nan a number or a string? Or should a representation of a 2D vector as (0.5,0.714) be treated as a number or as a string? Or consider an enum, or even a plain number that may actually be a string, as you show in the example above. All these questions are application-dependent.

Having said that, if you have no context and therefore don't know the type of a node before deserializing, ryml also gives you a toolbox in the tree, in the node and in csubstr, that can be used to figure out some information:

Tree t = parse_in_arena(R"(
hello: 1234
world: "1234"
)");
// does the node have a val?
assert(t["hello"].has_val());
assert(t["world"].has_val());
// does the val compare with a string?
assert(t["hello"].val() == "1234");
assert(t["world"].val() == "1234");
// is the val quoted?
assert(t["hello"].is_val_quoted() == false);
assert(t["world"].is_val_quoted() == true);
// does it look like a number (real or integer or unsigned)?
assert(t["hello"].val().is_number() == true);
assert(t["world"].val().is_number() == true);
// see also csubstr::is_integer(), csubstr::is_real(), etc

HTH.

biojppm avatar Sep 28 '22 18:09 biojppm

If you have a more concrete question of a problem you're trying to address, I'd be happy to help.

biojppm avatar Sep 28 '22 19:09 biojppm

Closing now, feel free to reopen if there are more questions.

biojppm avatar Sep 30 '22 10:09 biojppm

Would be nice to have this mentioned in quickstart. Usually users expect a built-in straight-forward way to get the type of the value within the node. Yaml data types are documented. In case of this library I had to read through all the quickstart.cpp, then look into definitions (because I didn't find it in quickstart), and at the end resort to gitub issues... It is also confusing, since val() returns a c4 basic_string_view which is a utility, used by the library. Hence no one would expect to look-up yaml/json related methods there. One would expect methods like is_bool, is_real, is_null, etc. as part of the yaml library itself.

sergio-eld avatar Jul 06 '24 21:07 sergio-eld

Yaml data types are documented

Do you mean data types or do you mean tags?

If you mean tags instead of data types, then YAML does indeed have several basic, common tags such as !!str, etc. So does rapidyaml.

OTOH, if you do mean data types, there exist only three YAML data types: seqs, maps or scalars. Scalars are string-derived values, and the spec is clear that the meaning of an untagged node is application specific:

In YAML, untagged nodes are given a type depending on the application.

Specifically, for untagged nodes,

If a document contains unresolved tags, the YAML processor is unable to compose a complete representation graph. In such a case, the YAML processor may compose a partial representation, based on each node’s kind [...]

The node's kind is of course only one of seq,map,or scalar.

So if you want to infer what a scalar's type is based on its string representation, that is purely a string method, and the helpers are there for that reason; YAML does not and can not specify how an untagged scalar should map to a type.

If OTOH you want to resolve tags, there are ample facilities in the library to achieve that.


Having said that, and to ensure that there is clear understanding, can you provide an example of your application code? What is it that you're trying to do, and how would you like to get it done?

biojppm avatar Jul 07 '24 00:07 biojppm

I forgot to mention that I'm using the library to parse json. Indeed, in case of yaml, it is rather hard to distinguish between the types without tags. I am successfully using the string functions. And the point of my comment is that it would be nice to have those functions mentioned in quickstart with json examples

sergio-eld avatar Jul 07 '24 02:07 sergio-eld

This part of the yaml spec says that with the "core schema",

Scalars with the “?” non-specific tag (that is, plain scalars) are matched with an extended list of regular expressions. However, in this case, if none of the regular expressions matches, the scalar is resolved to tag:yaml.org,2002:str (that is, considered to be a string).

and it gives a list of regular expressions for various scalar types. For use-cases like "I want to convert yaml data into python objects" I guess the natural implementation will be,

  1. Use these regular expressions to associate scalars with tags,
  2. Use the inferred tags to instantiate values with appropriate types.

I don't know if this "should" be part of rapidyaml, but I think functions to classify scalars using rules like these (especially standardized ones) would be useful for some users.

MatthewSteel avatar Jun 30 '25 15:06 MatthewSteel