pyyaml
pyyaml copied to clipboard
Handle the wrong datetime properly.
fix issue #382
- Move part of code from to constructor.py to resolver.py.
- Add the function
check_yaml_tag(tag, value)in resolver.py to check whether the timestamp is correct. If wrong, do not returntag:yaml.org,2002:timestamp.
Thanks- we'll take a look at this for 5.4. At a glance, the code looks OK, but I think we'll want a few more negative test cases.
@nitzmahone add some test cases and fix a bug in pyyaml/lib/yaml
@nitzmahone is this going to be fixed in 5.4? The existing behavior is truly broken and not compliant with the relevant yaml standards. It causes me frequent problems and @dota17 's MR looks to be a solid fix.
I'm looking at this for 5.4 but I'm not sure I follow it yet.
Let me try: This patch makes a plain scalar that implicitly types as a date, but fails to load as a Python date object, be loaded as a string.
Is that right?
I'm not sure what the right thing to do is. It is a YAML implementation's decision on what schema, implicit typing etc for a loader to use by default.
A less contentious fix might to give a clear error message that value is not a valid date and should be loaded as a string.
I might put this off until 6.0 but I welcome discussion now.
@reyjrar's initial issue #382 has as clearly elucidated description of the issue as possible. It is not currently possible to load the string "0000-00-00" without explicitly tagging the value as a string. This means that the semantics of a python datetime value are being forced into tank in spite of the tank standard clearly saying that this case should fail open resulting in casting to string. This makes perfect sense because obviously 0000-00-00 is not a valid date, but is a perfectly acceptable and valid string.
@reyjrar's initial issue #382 has as clearly elucidated description of the issue as possible. It is not currently possible to load the string "0000-00-00" without explicitly tagging the value as a string.
Explicit tagging is not required. One can (and in this case should) quote the value to make it load as a string.
a: '0000-00-00'
b: "0000-00-00"
c: !!str 0000-00-00
$ python -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read()))' < dates.yaml
{'a': '0000-00-00', 'c': '0000-00-00', 'b': '0000-00-00'}
This means that the semantics of a python datetime value are being forced into tank in spite of the tank standard clearly saying that this case should fail open resulting in casting to string.
I don't follow what tank means here.
I don't follow what tank means here. Dumb autocorrect, "yaml" not "tank"
Quote wrapping is another form of explicit type tagging. The important thing here is that arbitrary type specific limitations of the target language should not cause a hard fail when trying to cast into a native type because of implicit type matching.
The zero'd date string is just one example. And the obvious, to me, solution is the one indicated in the yaml standard. When casting to a native type on an implicit pattern match if the native type cannot be represented the deserialization should fallback to string.
...the one indicated in the yaml standard. When casting to a native type on an implicit pattern match if the native type cannot be represented the deserialization should fallback to string.
Can you link me to the part of the YAML 1.1 Spec that states the above?
I haven't been able to locate anything like that so far.
Can you link me to the part of the YAML 1.1 Spec that states the above?
I haven't been able to locate anything like that so far.
Section 3.3 has a pretty verbose description of the flow to handling failures during loading. Especially figure 3.7 which, in my reading, says that the decoding should fail to a partial representation as a scalar when the representation is unrecognized or invalid as a native data structure.
If I have misread or misconstrued the intent of this section I would love to have a "plain English" clarification of the standard's intent. It seems, to me, reasonable that there should not be a case where the type is implicit and then an error is raised when the data cannot be cast into the native data type. The failure to cast should be a stronger indicator that the implicit type identified is not valid for the data.
Likewise if the type is explicitly tagged then the validation when casting into the native representation /must/ be applied and an exception /must/ be raised.