psych icon indicating copy to clipboard operation
psych copied to clipboard

YAML.safe_load fails when a string contains a non-existent date

Open Fjan opened this issue 8 years ago • 22 comments

YAML.safe_load will raise an exception when you try to load text that happens to contain a sequence of numbers that looks like a date but is not:

s="2016-02-31"
YAML.safe_load(s.to_yaml)
# =>  Psych::DisallowedClass: Tried to load unspecified class: Date

Using YAML.load instead of safe_load works fine and text that contains a correct date works fine too. But this can be used to raise an exception on any application that uses YAML.safe_load on user provided text (accidentally or otherwise)

Fjan avatar Dec 23 '15 14:12 Fjan

I guess this is because Psych leans on the Date class to determine what is valid date or not. I'm not really keen on writing my own date validation logic. Do you have a suggestion for how to fix this?

tenderlove avatar Dec 23 '15 17:12 tenderlove

I took a brief look at the code but I don't understand yet why a valid date will not raise and exception. So it's doing validation somewhere anyway.

A workaround is to simply allow the Date class. Another option would be use modified tokeniser for safe_load so it only recognises allowed classes and turns the rest into strings.

Fjan avatar Dec 23 '15 17:12 Fjan

I took a brief look at the code but I don't understand yet why a valid date will not raise and exception. So it's doing validation somewhere anyway.

I don't follow. A valid date will raise an exception with safe_load:

irb(main):003:0> Psych.safe_load '2013-01-01'
Psych::DisallowedClass: Tried to load unspecified class: Date
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/class_loader.rb:97:in `find'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/class_loader.rb:28:in `load'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/class_loader.rb:39:in `block (2 levels) in <class:ClassLoader>'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/scalar_scanner.rb:70:in `tokenize'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:60:in `deserialize'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:123:in `visit_Psych_Nodes_Scalar'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:16:in `visit'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:6:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:32:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:311:in `visit_Psych_Nodes_Document'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:16:in `visit'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/visitor.rb:6:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych/visitors/to_ruby.rb:32:in `accept'
    from /Users/aaron/.rbenv/versions/ruby-trunk/lib/ruby/2.3.0/psych.rb:302:in `safe_load'
    from (irb):3
    from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:11:in `<main>'
irb(main):004:0>

A workaround is to simply allow the Date class. Another option would be use modified tokeniser for safe_load so it only recognises allowed classes and turns the rest into strings.

Both of these are major changes. I wouldn't be willing to do that in a bugfix release. I think returning strings would be fine, but I'm definitely not in favor of allowing Dates by default.

tenderlove avatar Dec 23 '15 17:12 tenderlove

I mean this:

Psych.safe_load('2013-01-01'.to_yaml)  # => "2013-01-01" 
Psych.safe_load('2013-02-31'.to_yaml)  # Psych::DisallowedClass

So some date parsing is apparently already happening somewhere, I just haven't been able to find it in the code, perhaps we can fix it there. I agree that allowing Date as an extra class in a bug fix release would not be appropriate, but this is a good workaround for people who have a problem and want a quick fix:

Psych.safe_load('2013-02-31'.to_yaml,[Date])   # => "2013-02-31"

Fjan avatar Dec 23 '15 18:12 Fjan

When you do '2013-01-01'.to_yaml, Psych looks at the string and realizes that it is ambiguous, so the resulting YAML is quoted:

irb(main):002:0> require 'psych'
=> true
irb(main):003:0> '2013-01-01'.to_yaml
=> "--- '2013-01-01'\n"
irb(main):004:0>

Note the single quotes around the date string in the resulting YAML.

Since '2013-02-31' isn't a valid date, it considers this to not be an ambiguous value, so it doesn't add the quotes around it:

irb(main):004:0> '2013-02-31'.to_yaml
=> "--- 2013-02-31\n...\n"
irb(main):005:0>

The single quotes tell the parser "this is absolutely a string, do not check for other values". The second value isn't ambiguous when dumping the YAML, but is ambiguous when parsing. Maybe that helps?

tenderlove avatar Dec 23 '15 18:12 tenderlove

Yes, I just realised that. So another avenue would be to fix .to_yaml to always quote strings that look like a date, that's probably much easier to do.

Perhaps we should have a .to_safe_yaml that can be used on user supplied input and can be relied upon to only create scalars. Sigh. But it's probably better to move to JSON (which I was hoping to avoid).

Fjan avatar Dec 23 '15 18:12 Fjan

Any news on this?

lucascaton avatar Jan 21 '17 06:01 lucascaton

[2] pry(main)> str = "tt: 2012-09-24T13:00:00"
=> "tt: 2012-09-24T13:00:00"
[7] pry(main)> YAML.safe_load(str)
Psych::DisallowedClass: Tried to load unspecified class: Time
from /usr/share/ruby/psych/class_loader.rb:97:in `find'

What is the problem here?

What I'm really after is stop Psych from parsing the date and then dumping back to YAML in a different format. I want to read the YAML, change some values and dump back to YAML. The problem is that once the date is parsed, then dumping back to YAML results in

---
tt: 2012-09-24 16:00:00.000000000 +03:00

This is why I tried whether safe_load will help. It is same problem as described here http://stackoverflow.com/questions/32730275

akostadinov avatar Mar 20 '17 15:03 akostadinov

I just got hit by this after the hashie library updated and they now call safe_load. I would really like to request that this method just returns strings for anything that is deemed unsafe to be parsed.

The difference is that on the end of the chain I can work with sanitized data. If I really want a Date object I can do the conversion myself. Or I can use it as a string, but if reading in a user supplied configuration file leads to a backtrace, than that's it. Game over. Epic fail. Given that a library like hashi is calling this means I have no control over the call to load vs safe_load, and even if I did, I would prefer to be safe and decide how I validate date strings before converting them, but I don't want it to become impossible to read in a perfectly valid yaml file.

My program doesn't need to have values automagically converted, but asking every user of my library for the eternity to never ever put an unquoted date in a yaml file is impossible.

najamelan avatar Aug 28 '18 20:08 najamelan

We are running into this issue as well. A user provided String that is attempted to be serialized and put into the database is 0000-00-00:

require 'psych'

original_valid_content = "---\n- >-\n  test\n- >-\n  1976-01-02"
original_invalid_content = "---\n- >-\n  test\n- >-\n  0000-00-00"

# Works
deserialized_content = Psych.safe_load(original_valid_content)
new_serialized_content = Psych.dump(deserialized_content)
# => "---\n- test\n- '1976-01-02'\n"
new_deserialized_content = Psych.safe_load(new_serialized_content)

# Breaks:
deserialized_content = Psych.safe_load(original_invalid_content)
new_serialized_content = Psych.dump(deserialized_content)
# => "---\n- test\n- 0000-00-00\n"
new_deserialized_content = Psych.safe_load(new_serialized_content)
# => Psych::DisallowedClass: Tried to load unspecified class: Date

Soleone avatar May 30 '19 22:05 Soleone

Looking at tenderlove's comment here maybe it might make most sense to be more aggressive with quoting on dump?

Ensure that a string that is not a valid date like 0000-00-00 gets quoted and treat it as "ambiguous". What would be a drawback of that type of solution?

Soleone avatar May 30 '19 22:05 Soleone

Ensure that a string that is not a valid date like 0000-00-00 gets quoted and treat it as "ambiguous". What would be a drawback of that type of solution?

Honestly, I can't think of any drawback. I'd merge a commit that does that.

On a different subject, maybe in the future we should add a safe_dump. I want to guarantee that objects can round trip through dump and load, but I'm not sure if we can make the same guarantee about dump -> safe_load. This case seems like we can though.

tenderlove avatar Aug 15 '19 18:08 tenderlove

Is there any way around the issue? A date notation like 20190101 works fine, however I do need 2019-01-01...

stiller-leser avatar Oct 01 '19 15:10 stiller-leser

If you want to make sure a YAML value is always treated like a string, you can cast it explicitly by prepending !!str . Example:

pry(main)> YAML.safe_load("!!str 2019-01-01")
=> "2019-01-01"

danielrehner avatar Apr 13 '20 18:04 danielrehner

Can anyone explain what is the problem with @najamelan's suggestion?

I would really like to request that this method just returns strings for anything that is deemed unsafe to be parsed.

Users can put quotes around dates themselves, but in many cases that may prevent interoperability between Ruby/Psych-based apps (in my case gollum) and other applications that parse YAML, when the latter do support the parsing of dates. If Psych just returned a string for anything that looks like a date, the user wouldn't need to worry about how to specify their data.

dometto avatar Jun 11 '20 09:06 dometto

ping

MatzFan avatar Feb 05 '21 17:02 MatzFan

Users can put quotes around dates themselves, but in many cases that may prevent interoperability between Ruby/Psych-based apps

Fully agree with @dometto.

The YAML spec does support "date" natively: https://yaml.org/spec/1.2.2/

Example 2.22 Timestamps

date: 2002-12-14

Example 2.23 Various Explicit Tags

not-date: !!str 2002-04-28

It is clear that quoting or escaping the date is not the correct solution (according to the spec).

Is there a plan to resolve this issue? @najamelan's suggestion of "just returns strings for anything that is deemed unsafe to be parsed." seems reasonable as it allows for the user to handle any "unsafe" cases.

Thanks!

ronaldtse avatar Jan 13 '22 10:01 ronaldtse

Just a comment about the 1.2.2 spec and the timestamp example: None of the recommended schemas in YAML 1.2 provide a timestamp tag anymore, and the example 2.22 is a leftover that should probably have been removed. I created an issue: https://github.com/yaml/yaml-spec/issues/268

perlpunk avatar Jan 14 '22 11:01 perlpunk

Thanks @perlpunk for clarifying the spec -- is the YAML way for handling a date/time now a string?

ronaldtse avatar Jan 15 '22 05:01 ronaldtse

@ronaldtse yes, in YAML 1.2 it would be loaded as a string, and the app can turn it into a date. Or the specific YAML loader can add a custom tag resolver the same way it was working in 1.1. In 1.1 there was no standard set of tags to be implemented, so some implement merge, date, binary, and some didn't. In 1.2 there is a recommended Core schema that every YAML library should implement. That will hopefully increase interoperability. But it has less tags than the complete 1.1 tag repository.

perlpunk avatar Jan 18 '22 11:01 perlpunk

Was dealing with a similar date parsing issue in js-yaml: https://github.com/nodeca/js-yaml/issues/477, and I'm sorry to disagree with @ronaldtse, but the spec is definitely impossibly ambiguous about those timestamp examples and we can't simply use RFC 3339 (or even a faithfully minimal extension of it) to support those examples.

To quote myself:

Just found this and I've got to say, the spec is way to ambiguous on its definition of timestamps. YAML 1.2 §10.1. Failsafe Schema doesn't actually provide any guidance about what to do with Example 2.22 Timestamps even though it's implied that it shouldn't need an explicit tag. As an editor of other specs, I'd call this a spec bug, not just an implementation one. So congrats, you're both right!

I'm inclined to say we should defer to RFC 3339 Date and Time on the Internet: Timestamps §5.6 Internet Date/Time Format, which covers the canonical, iso8601 and date examples, but I really don't know what to do about spaced: 2001-12-14 21:59:43.10 -5. If it were me, I'd say allowing space would go from RFC 3339's:

   time-numoffset  = ("+" / "-") time-hour ":" time-minute
   time-offset     = "Z" / time-numoffset

   partial-time    = time-hour ":" time-minute ":" time-second
                     [time-secfrac]
   full-date       = date-fullyear "-" date-month "-" date-mday
   full-time       = partial-time time-offset

   date-time       = full-date "T" full-time

to instead have:

   full-time       = partial-time [" "] time-offset
   date-time       = full-date ("T" / "t" / " ") full-time

But then we don't either allow 2019-03-13 20:18:42 +0000 (no colon) or 2001-12-14 21:59:43.10 -5 (not even remotely close to a time-offset).

I think that a well specified standard would require not just that the date-time match the ABNF, but that it be a valid date. If we deferred to RFC3339, it says:

    date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
                             ; month/year

Which pretty clearly implies 2016-02-31 is not an RFC3339 date.

Personally, I think the horses have left the barn and the spec should suggest a reasonable middle ground. Most YAML parsers have date parsing, it's acting differently everywhere, both inbound and outbound. We create standards to avoid this (well, in theory ;-).

josephholsten avatar Sep 30 '22 01:09 josephholsten

Psych::ClassLoader::ALLOWED_PSYCH_CLASSES = [ Time ]

module Psych
  class ClassLoader
    ALLOWED_PSYCH_CLASSES = [] unless defined? ALLOWED_PSYCH_CLASSES
    class Restricted < ClassLoader
      def initialize classes, symbols
        @classes = classes + Psych::ClassLoader::ALLOWED_PSYCH_CLASSES.map(&:to_s)
        @symbols = symbols
        super()
      end
    end
  end
end 

It works for me http://sundivenetworks.com/archive/2021/tried-to-load-unspecified-class-time-psych-disallowedclass.html

anke1460 avatar Dec 06 '22 09:12 anke1460