strictyaml icon indicating copy to clipboard operation
strictyaml copied to clipboard

Can not load document created by StrictYAML

Open sio opened this issue 6 years ago • 9 comments

StrictYAML dumps empty dictionary as {} which it later refuses to load because it's "ugly flow style". That makes it impossible to load documents created by StrictYAML back into itself.

Example (strictyaml==0.13.0)

>>> data = {'hello': {}}
>>> doc = as_document(data)
>>> doc
YAML(OrderedDict([('hello', OrderedDict())]))
>>> serialized = doc.as_yaml()
>>> serialized
'hello: {}\n'
>>> doc2 = load(serialized)

…

strictyaml.exceptions.FlowMappingDisallowed: While scanning
  in "<unicode string>", line 1, column 8:
    hello: {}
           ^ (line: 1)
Found ugly disallowed JSONesque flow mapping (surround with ' and ' to make text appear literally)
  in "<unicode string>", line 1, column 9:
    hello: {}
            ^ (line: 1)

sio avatar Oct 15 '18 15:10 sio

Yikes, that's not good. I'll take a look.

Thanks for reporting this!

On Mon, 15 Oct 2018, 16:00 Vitaly Potyarkin, [email protected] wrote:

StrictYAML dumps empty dictionary as {} which it later refuses to load because it's "ugly flow style". That makes it impossible to load documents created by StrictYAML back into itself.

Example (strictyaml==0.13.0)

data = {'hello': {}}

doc = as_document(data)

doc

YAML(OrderedDict([('hello', OrderedDict())]))

serialized = doc.as_yaml()

serialized

'hello: {}\n'

doc2 = load(serialized)

strictyaml.exceptions.FlowMappingDisallowed: While scanning

in "", line 1, column 8:

hello: {}

       ^ (line: 1)

Found ugly disallowed JSONesque flow mapping (surround with ' and ' to make text appear literally)

in "", line 1, column 9:

hello: {}

        ^ (line: 1)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AFyVNUqyqwbDeiSrzZkcP62cw-X-v9kfks5ulKMfgaJpZM4XchrA .

crdoconnor avatar Oct 15 '18 15:10 crdoconnor

Hmm, I think I'm inclined to raise an exception in as_document in this case and refuse to accept dicts unless they have at least one item - since without a schema strictyaml won't parse empty dicts.

In theory I think as_document should be able to take them if you give it a schema with EmptyDict though (which would serialize as an empty string).

All comments, suggestions as to the behavior, details from your use case, etc. welcomed, btw. I'm a little unsure how to approach this feature because I've not actually used as_document for anything very major.

crdoconnor avatar Oct 15 '18 15:10 crdoconnor

Empty dicts are valid data points in my use case, so throwing an exception on encountering them would be very unfortunate.

My process is basically this:

  1. Read structured data from file
  2. Modify it (heavily)
  3. Write it back to file

I handle schema validation with other tools, so I was trying to avoid second validation with StrictYAML. I load the data without any schema, then take whatever is in the .data property and work with that. I've tried to work with YAML object directly, but could not figure out how to disable schema validation completely and have often encountered validation errors otherwise.

I come back to StrictYAML only for dumping the results to file - that's when I need to create YAML object from Python native data structures. Maybe I was wrong to use as_document(object).as_yaml() for that? In essence, I need an analog of json.dumps() method.

sio avatar Oct 15 '18 16:10 sio

I see two solutions here:

  1. My recommended one would be to implement a schema in strictyaml and modify the YAML object directly. You can use the EmptyDict | Map/MapPattern for mappings which could be empty, and when empty it would render to YAML as an empty string.

  2. My less recommended way would be to take your data with empty dicts and transform them into empty strings and vice versa. StrictYAML without a schema can do empty strings, but it can't do empty dicts (or lists).

Unfortunately, while light modification to YAML objects is currently working fine, heavy modification is slightly buggy and so probably won't work for your use case right now (that code is a bit hacky). I will release version 0.14.0 tomorrow which should allow for heavy modification though.

Incidentally, may I ask what you're currently for schema validation instead? Are there things it does that StrictYAML's schema doesn't?

crdoconnor avatar Oct 17 '18 22:10 crdoconnor

Thank you for your reply. I will wait for the next release and see how it goes.

You should probably add that exception you've mentioned or add a note somewhere in the documentation that empty dicts without schema are not supported.

As for validation, I use jsonschema. I chose it because I needed to completely decouple schemas from the code and because I need to be able to validate schemas from third parties. I have considered XSD for a while, but Python tooling for jsonschema seemed much nicer.

I did not consider StrictYAML schemas or other schemas defined with Python code because loading such schemas from third parties without security issues would be next to impossible.

sio avatar Oct 18 '18 07:10 sio

Yes, I'll update the docs along with the release. This is an important edge case.

I could also investigate generating StrictYAML schemas from JSON schema.

It's been requested before. I've always been reluctant since when I used it I was perpetually frustrated by its limitations, but if there are people out there that use it and love it then I'm happy to support it.

Unfortunately there are also other downsides to separating parsing and schema validation (e.g. losing the location of the error).

On Thu, 18 Oct 2018, 10:24 Vitaly Potyarkin, [email protected] wrote:

Thank you for your reply. I will wait for the next release and see how it goes.

You should probably add that exception you've mentioned or add a note somewhere in the documentation that empty dicts without schema are not supported.

As for validation, I use jsonschema. I chose it because I needed to completely decouple schemas from the code and because I need to be able to validate schemas from third parties. I have considered XSD for a while, but Python tooling for jsonschema seemed much nicer.

I did not consider StrictYAML schemas or other schemas defined with Python code because loading such schemas from third parties without security issues would be next to impossible.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/43#issuecomment-430902986, or mute the thread https://github.com/notifications/unsubscribe-auth/AFyVNd26OIwyQl_gPXqqSVo0-6vfk211ks5umCyXgaJpZM4XchrA .

crdoconnor avatar Oct 18 '18 11:10 crdoconnor

Release 0.14.0 took a little longer than I expected, but it's out now. An exception is raised if you attempt to serialize empty dicts or lists:

https://hitchdev.com/strictyaml/using/alpha/howto/build-yaml-document/

However, you can now serialize empty dicts/lists with a schema as documented here:

https://hitchdev.com/strictyaml/using/alpha/scalar/empty/

I'll have a go today at building a prototype that turns JSON schemas in to StrictYAML schema objects. I'll release it as a separate module, however (will need to pip install strictyamljsonschema).

crdoconnor avatar Oct 20 '18 10:10 crdoconnor

I've made a start here, but it's quite MVP: https://github.com/hitchdev/strictyamljsonschema

It would help if you let me know which types and what properties on those types you need from here: https://json-schema.org/understanding-json-schema/index.html (e.g. string, integer, number, object (properties, required properties)).

There's a fair bit in there that doesn't look useful to most people.

crdoconnor avatar Oct 20 '18 12:10 crdoconnor

Thank you for devoting so much attention to my issue!

Here are the schemas I'm currently using: one, two, but the idea is to support any JSON schema from any source. At the moment I use the basic value types (string, array, object), limit or allow creating additional properties (additionalProperties), use oneOf and anyOf, specify required fields and field name patterns (patternProperties).

The tricky part in my usecase might be that I need to use separate schemas for the whole document and for several its branches at the same time. E.g. the top-level document allows any dictionary as the branch value, but the branch schema specifies a lot more restrictions on that same dict.

PS: I do not want to mislead you into feeling any extra pressure from me. You should know that you're not blocking my progress in any way. I already have implemented serialization/deserialization routines with JSON/YAML backends and StrictYAML support is mentally tagged as "a nice feature to add when I'm done with everything else". Yours is a great project and even if I don't get to use it this time, I'll keep it in mind for the future :)

sio avatar Oct 21 '18 11:10 sio