strictyaml icon indicating copy to clipboard operation
strictyaml copied to clipboard

Difficult to use Optional("key", None)

Open wwoods opened this issue 5 years ago • 11 comments

Great project, just a few usability issues.

Assume the following schema:

import strictyaml as s
s.load("a: 8", s.Map({ 
    s.Optional("a", 1): s.Int(),
    s.Optional("key_a", None): s.Str(), 
    s.Optional("key_b", {}): s.EmptyDict() | s.Map({})
}))

A few issues with usability arise:

  1. Why is a: 8 required from the YAML file? It seems like a blank file (or one consisting only of comments) should be perfectly valid with this schema.
  2. The result is YAML(OrderedDict([('a', 8), ('key_b', {})])). It is confusing that key_a is missing entirely from the results, despite having a default value of None. A common pattern for fixing this in Python is to define Optional using e.g.:
missing = {}  # Creates a unique object
class Optional:
    def __init__(self, key, default=missing):
        ...

# Later
if optional.default is missing:
    # User specified no default, OK not to include in output

In this way, Optional with no default specified can result in the absence of the key, while Optional with any default would always be present in the output.

wwoods avatar Jun 21 '19 23:06 wwoods

Ah, one more I forgot about: there seems to be no way to specify an empty dictionary within the yaml file: hi: {} of course violates the no-flow ideology of StrictYAML. I agree with no flows in general, but some way to specify a blank dictionary is important in some cases (e.g. Seq(Map({Optional('k'): Str()})), where all values in the Map are optional).

wwoods avatar Jun 21 '19 23:06 wwoods

Why is a: 8 required from the YAML file? It seems like a blank file (or one consisting only of comments) should be perfectly valid with this schema.

You have stated that there is a mapping, therefore there needs to be a mapping. A mapping must contain at least one key value pair, otherwise it's not a mapping. You can have all keys being optional if you like but at least one must be used.

If you want a blank file to be valid you can just do EmptyNone() | Map({...}) (or EmptyDict, to have it parse to an empty dict).

The result is YAML(OrderedDict([('a', 8), ('key_b', {})])). It is confusing that key_a is missing entirely from the results, despite having a default value of None

Granted. I'll have a think over this edge case.

Ah, one more I forgot about: there seems to be no way to specify an empty dictionary within the yaml file: hi: {} of course violates the no-flow ideology of StrictYAML. I agree with no flows in general, but some way to specify a blank dictionary is important in some cases (e.g. Seq(Map({Optional('k'): Str()})), where all values in the Map are optional).

I believe you're looking for EmptyDict, which will treat a scalar value with nothing there as an empty dict ("x:").

crdoconnor avatar Jun 22 '19 06:06 crdoconnor

You have stated that there is a mapping, therefore there needs to be a mapping. A mapping must contain at least one key value pair, otherwise it's not a mapping

Why is that, anyway? I can accept it, but no other language / data structure makes the distinction between an empty map and a mapping with values. A mapping it just a type of data structure, no reason it shouldn't be empty. The functionality added by Optional means that strictyaml covers the use cases of empty and mandatory non-empty mappings, without needing a separate EmptyDict class (which is also named strangely; why no EmptyMap?).

If you want a blank file to be valid you can just do EmptyNone() | Map({...}) (or EmptyDict, to have it parse to an empty dict).

Good to know.

I believe you're looking for EmptyDict, which will treat a scalar value with nothing there as an empty dict ("x:").

I still have a minor grievance with this, in that a schema is then required to produce an empty dict, but I'll admit it's a corner case.

wwoods avatar Jun 22 '19 16:06 wwoods

Why is that, anyway? I can accept it, but no other language / data structure makes the distinction between an empty map and a mapping with values. A mapping it just a type of data structure, no reason it shouldn't be empty.

Two reasons:

  1. Paring down YAML to the bare minimum and losing flow style meant losing the ability to represent non-empty mappings without schemas. I'm pretty ok with this since blank string suffices as a replacement. I have been using it as such.

  2. I think the distinction between an empty mapping and a nominally blank value is somewhat confusing for a non-developer, and StrictYAML, unlike YAML, is supposed to be super simple for non developers to use.

I still have a minor grievance with this, in that a schema is then required to produce an empty dict, but I'll admit it's a corner case.

You can also not use a schema and just manually convert empty string to empty dict.

Though StrictYAML is not really mean to be used without schema for anything halfway complicated.

crdoconnor avatar Jun 23 '19 11:06 crdoconnor

Hmm, so you're saying that all of the Empty* schemas were built with empty strings in mind, rather than e.g. the dict which is returned being empty. That makes a bit more sense.

I think the distinction between an empty mapping and a nominally blank value is somewhat confusing for a non-developer, and StrictYAML, unlike YAML, is supposed to be super simple for non developers to use.

Absolutely, but there's no need to do so at the cost of convenience when developing with StrictYAML. My main issue with usability is that Map({...}) refuses to validate {}, e.g. when the Map is the value attached to an Optional('key_to_map', {}), and the Map has all optional keys:

import strictyaml as s
s.load('a: 8', s.Map({'a': s.Int(), s.Optional('b', {}): s.Map({s.Optional('key', 8): s.Int()})}))
# InvalidOptionalDefault: Optional default for 'b' failed validation

If you don't want to support a blank string in this case, fine, but I don't think it should be necessary to add EmptyDict to the schema in this case.

wwoods avatar Jun 23 '19 16:06 wwoods

Hmm, so you're saying that all of the Empty* schemas were built with empty strings in mind, rather than e.g. the dict which is returned being empty. That makes a bit more sense.

Precisely. Since there's no clear one to one relationship between empty dict and YAML you have to use the schema to nail down what it should actually be - is empty dict the same as not having a key? Or is it having a key with an empty string? Or, is it just meaningful at all?

This discussion clearly indicates that the docs could be a bit clearer on this point, however.

Absolutely, but there's no need to do so at the cost of convenience when developing with StrictYAML. My main issue with usability is that Map({...}) refuses to validate {}, e.g. when the Map is the value attached to an Optional('key_to_map', {}), and the Map has all optional keys:

So, a schema is used not only to parse StrictYAML to data but also to serialize data to StrictYAML. There's is, by necessity, a symmetric reflexive relation here. Any optional parameter has to be something that could unambiguously be serialized into StrictYAML and parsed back again with no loss of meaning or data. Should an empty dict be serialized as a key with an empty string? No key at all? Or is it simply invalid? Should an empty string be parsed as a dict? Should it be an empty string? None?

This is what having a schema nails down - and nails it down in such a way that subtle bugs don't creep in (e.g. erroneously getting back None where you actually expected an empty dict). It might seem like onerous work up front but the strictness is supposed to protect you from subtle bugs later on and give you absolute confidence in what you're parsing or serializing, as well as the ability to trivially and safely roundtrip data.

If you don't want to support a blank string in this case, fine, but I don't think it should be necessary to add EmptyDict to the schema in this case.

Well, with respect, I don't think it's all that onerous to add emptydicts or emptynones.

I understand that Map with all Optionals might imply that an empty dict is possible (and that could be confusing), but I need some way to resolve the natural ambiguities while maintaining symmetric parse/serialization reflexivity.

Am open to other suggestions as to how to square this circle, but it needs to work well and simply in all cases and not just this one.

crdoconnor avatar Jun 24 '19 02:06 crdoconnor

Hmm, so you're saying that all of the Empty* schemas were built with empty strings in mind, rather than e.g. the dict which is returned being empty. That makes a bit more sense.

Precisely. Since there's no clear one to one relationship between empty dict and YAML you have to use the schema to nail down what it should actually be - is empty dict the same as not having a key? Or is it having a key with an empty string? Or, is it just meaningful at all?

This discussion clearly indicates that the docs could be a bit clearer on this point, however.

Absolutely, but there's no need to do so at the cost of convenience when developing with StrictYAML. My main issue with usability is that Map({...}) refuses to validate {}, e.g. when the Map is the value attached to an Optional('key_to_map', {}), and the Map has all optional keys:

So, a schema is used not only to parse StrictYAML to data but also to serialize data to StrictYAML. There's is, by necessity, a symmetric reflexive relation here. Any optional parameter has to be something that could unambiguously be serialized into StrictYAML and parsed back again with no loss of meaning or data. Should an empty dict be serialized as a key with an empty string? No key at all? Or is it simply invalid? Should an empty string be parsed as a dict? Should it be an empty string? None?

This is what having a schema nails down - and nails it down in such a way that subtle bugs don't creep in (e.g. erroneously getting back None where you actually expected an empty dict). It might seem like onerous work up front but the strictness is supposed to protect you from subtle bugs later on and give you absolute confidence in what you're parsing or serializing, as well as the ability to trivially and safely roundtrip data.

If you don't want to support a blank string in this case, fine, but I don't think it should be necessary to add EmptyDict to the schema in this case.

Well, I don't think it's all that onerous to add emptydicts or emptynones. I have to do this too, of course, but I've never had to do it so many times that it became an issue for me.

I understand that Map with all Optionals might imply that an empty dict is possible (and that could be confusing), but I need some way to resolve the natural ambiguities while maintaining symmetric parse/serialization reflexivity.

Am open to other suggestions as to how to square this circle, but it needs to work well, unambiguously and simply in all cases and not just this one.

crdoconnor avatar Jun 24 '19 02:06 crdoconnor

So, a schema is used not only to parse StrictYAML to data but also to serialize data to StrictYAML. There's is, by necessity, a symmetric reflexive relation here. Any optional parameter has to be something that could unambiguously be serialized into StrictYAML and parsed back again with no loss of meaning or data

Even with your longer description, I completely fail to see why Seq cannot serialize an empty list. After all, I would presume that strictyaml assumes that the same schema is used for loading what it saves. So, if Seq should treat empty strings the same way EmptyList does, that behavior is consistent for both saving and loading. Same with Map and EmptyDict. The only remaining issue would be that strictyaml would need to remember if the key existed in the file as loaded, but as far as I can tell, you already have that behavior implemented for e.g. comments. Could add a similar bit of metadata for these fields.

Well, with respect, I don't think it's all that onerous to add emptydicts or emptynones.

The problem is that you're forcing users of strictyaml to be aware of this. Yes, it's not too bad, but on the other hand it's a real hurdle for newcomers (have to trawl through the docs and learn about EmptyDict / EmptyList).

wwoods avatar Jun 27 '19 19:06 wwoods

I’ve admittedly only skimmed this thread but I would like to echo point 2 made by @wwoods in the initial comment: Having non-provided default values be missing in the YAML mapping is … inconvenient.

Effectively this means that my code is littered with the equivalent of

name = data['name'] if 'name' in data else None

It’s not tragic but it is unnecessary clutter. In particular it means that I can’t use values from the mapping directly in expressions, I first have to assign them to a local variable. For instance, I’d like to be able to write (actual code)

if not test['disabled'] or test['name'] in self.include_tests:
    self.enable(test)

However, this crashes. Instead I’d have to write

if not test['disabled'] or ('name' in test and test['name'] in self.include_tests):
    self.enable(test)

klmr avatar Jul 03 '19 16:07 klmr

@klmr just a tiny suggestion that

name = data['name'] if 'name' in data else None

can easily be replaced with

name = data.get('name')

which makes the syntactic burden of this feature of strictyaml easier to deal with.

jamespfennell avatar Sep 13 '19 16:09 jamespfennell

The referenced issue above fixed this problem for me. By specifying the schema like this:

schema = Map({
       Optional("endpoint_url", drop_if_none=False): EmptyNone() | Str()
})

It sets the value to None instead of dropping the key completely.

major-mayer avatar Apr 27 '21 15:04 major-mayer