dry-types icon indicating copy to clipboard operation
dry-types copied to clipboard

fast coercion date and time

Open ermolaev opened this issue 4 years ago • 9 comments

I benchmarked dry-types and activemodel and noticed that dry-types has low perfomance, and I dip inside code rails

https://github.com/rails/rails/blob/main/activemodel/lib/active_model/type/helpers/time_value.rb#L72 https://github.com/rails/rails/blob/main/activemodel/lib/active_model/type/date.rb#L30

rails use regexp for frequent case

ermolaev avatar Feb 02 '21 22:02 ermolaev

@ermolaev why isn't it a patch to Ruby?

flash-gordon avatar Feb 03 '21 08:02 flash-gordon

I mean it sounds strange we invent "performance-optimized" code and copy it between line libraries... What if there's another, even faster set of regex? :)

flash-gordon avatar Feb 03 '21 08:02 flash-gordon

Just what I said lol https://github.com/rails/rails/commit/9ea15f1927393a88c0a5dbe250420b7afdfa6aed

flash-gordon avatar Feb 03 '21 08:02 flash-gordon

Another thought (sorry for spamming): we should provide a way to fix the format to ISO8601. Mb it should be a different key, e.g. :iso_time and :iso_date. It'll be 💯 faster.

flash-gordon avatar Feb 03 '21 08:02 flash-gordon

why isn't it a patch to Ruby?

ruby has other methods for parsing time in strict formats

Time.iso8601("2011-10-05T22:26:12-04:00")
Time.strptime("2000-10-31", "%Y-%m-%d") 

this methods is fast, but this is a very strict format logic, this methods throw exeptions, and for fallback we need handle it, but exeptions is very slow

Dry::Types["params.date"] Dry::Types["json.date"] apply string in customs format and it more robust code for parsing times

Another thought (sorry for spamming): we should provide a way to fix the format to ISO8601. Mb it should be a different key, e.g. :iso_time and :iso_date. It'll be 100 faster.

I think new keys will be confuse developers, and nobody will use them, also better to encapsulate the parsing details in the library

ermolaev avatar Feb 03 '21 16:02 ermolaev

this methods is fast, but this is a very strict format logic, this methods throw exeptions, and for fallback we need handle it, but exeptions is very slow

But we don't define Date.parse and Time.parse either. Shouldn't they be improved in Ruby instead? Currently, what we discuss here started as a custom parsing method for parsing time in the mysql adapter (!) in rails (https://github.com/rails/rails/commit/bd087e6716f8b48d48386c3d42e529849d7b68cd). It looks like nobody tried to backport it to Ruby itself.

I think new keys will be confuse developers, and nobody will use them, also better to encapsulate the parsing details in the library

These are strong assumptions and I don't share them. It's true that being faster by default is better but we also need to consider the price (support, bugs, etc). In my opinion, parsing dates without a predefined format is a potential source of errors. Delegating format detection to Time.parse you create an implicit dependency on how it operates. Not only it's slower, but it can also change quite randomly accepting more/fewer formats. This is likely not a problem for params but I'd expect json types to be more strict.

This is what I use personally

      TypeContainer.register(
        'json.iso_time',
        ::Types::Time.constrained(
          gteq: Time.new(2014, 1, 1),
          lteq: Time.new(2038, 1, 1)
        ).constructor { |value, &failed|
          Try[::ArgumentError] { ::Time.iso8601(value.sub(' ', 'T')) }.value_or {
            failed.(value)
          }
        }
      )

I'm not against the change (though I'm not sure about the implementation) but I think we should try adding it to Ruby first.

Side note: is there a safe strptime in Ruby?

flash-gordon avatar Feb 03 '21 17:02 flash-gordon

I looked it up, seems like using Date._strptime or Date._iso8601 would be faster and cleaner. I checked, they are even defined in JRuby. WDYT?

flash-gordon avatar Feb 03 '21 18:02 flash-gordon

I mean it sounds strange we invent "performance-optimized" code and copy it between line libraries... What if there's another, even faster set of regex? :)

Ruby has regexp for Time.iso8601 https://github.com/ruby/ruby/blob/v3_0_0/lib/time.rb#L614 :scream:

I looked it up, seems like using Date._strptime or Date._iso8601 would be faster and cleaner. I checked, they are even defined in JRuby. WDYT?

Date benchmark

ISO_DATE = /\A(\d{4})-(\d\d)-(\d\d)\z/
string = '2011-12-03'
Benchmark.ips do |x|
  x.report('parse') do |n|
    while n > 0
      Date.parse(string)
      n-=1
    end
  end

  x.report('_parse') do |n|
    while n > 0
      ::Date.new(*::Date._parse(string, false).values_at(:year, :mon, :mday))
      n-=1
    end
  end

  x.report('_iso8601') do |n|
    while n > 0
      ::Date.new(*::Date._iso8601(string).values_at(:year, :mon, :mday))
      n-=1
    end
  end

  x.report('_strptime') do |n|
    while n > 0
      ::Date.new(*Date._strptime(string, '%Y-%m-%d').values_at(:year, :mon, :mday))
      n-=1
    end
  end

  x.report('_strptime.values') do |n|
    while n > 0
      ::Date.new(*Date._strptime(string, '%Y-%m-%d').values)
      n-=1
    end
  end

  x.report('regexp') do |n|
    while n > 0
      if string =~ ISO_DATE
        ::Date.new($1.to_i, $2.to_i, $3.to_i)
      end
      n-=1
    end
  end

  x.compare!
end

# Comparison:
#     _strptime.values:  1282303.3 i/s
#            _strptime:  1194136.2 i/s - same-ish: difference falls within error
#               regexp:   997567.6 i/s - 1.29x  (± 0.00) slower
#             _iso8601:   529221.1 i/s - 2.42x  (± 0.00) slower
#                parse:   282310.3 i/s - 4.54x  (± 0.00) slower
#               _parse:   270369.9 i/s - 4.74x  (± 0.00) slower

Time benchmark

ISO_DATETIME = /
        \A
        (\d{4})-(\d\d)-(\d\d)(?:T|\s)            # 2020-06-20T
        (\d\d):(\d\d):(\d\d)(?:\.(\d{1,6})\d*)?  # 10:20:30.123456
        (?:(Z(?=\z)|[+-]\d\d)(?::?(\d\d))?)?     # +09:00
        \z
      /x.freeze
def fast_string_to_time(string)
  return unless ISO_DATETIME =~ string

  usec = $7.to_i
  usec_len = $7&.length
  if usec_len&.< 6
    usec *= 10**(6 - usec_len)
  end

  if $8
    offset = $8 == "Z" ? 0 : $8.to_i * 3600 + $9.to_i * 60
  end

  ::Time.local($1.to_i, $2.to_i, $3.to_i, $4.to_i, $5.to_i, $6.to_i, usec, offset)
end

string = '2021-02-04T01:48:47.551+03:00'

Benchmark.ips do |x|
  x.report('parse') do |n|
    while n > 0
      ::Time.parse(string)
      n-=1
    end
  end

  x.report('iso8601') do |n|
    while n > 0
      ::Time.iso8601(string)
      n-=1
    end
  end

  x.report('_parse') do |n|
    while n > 0
      ::Time.new(*::Date._parse(string, false).values_at(:year, :mon, :mday, :hour, :min, :sec, :sec_fraction))
      n-=1
    end
  end

  x.report('_iso8601') do |n|
    while n > 0
      ::Time.new(*::Date._iso8601(string).values_at(:year, :mon, :mday, :hour, :min, :sec, :sec_fraction))
      n-=1
    end
  end

  x.report('_strptime') do |n|
    while n > 0
      ::Time.new(*Date._strptime(string, '%Y-%m-%dT%H:%M:%S.%N%z').values_at(:year, :mon, :mday, :hour, :min, :sec, :sec_fraction))
      n-=1
    end
  end

  x.report('regexp') do |n|
    while n > 0
      fast_string_to_time(string)
      n-=1
    end
  end

  x.compare!
end

# Comparison:
#               regexp:   201472.4 i/s
#            _strptime:   159658.5 i/s - 1.26x  (± 0.00) slower
#             _iso8601:   140898.1 i/s - 1.43x  (± 0.00) slower
#              iso8601:   138998.2 i/s - 1.45x  (± 0.00) slower
#               _parse:    66379.7 i/s - 3.04x  (± 0.00) slower
#                parse:    57152.7 i/s - 3.53x  (± 0.00) slower

ermolaev avatar Feb 03 '21 23:02 ermolaev

Thanks for this! A minor heads up. I benchmarked various options and looked into the ruby sources. It turns out that ruby itself relies on regular expressions internally. My benchmarking showed approx 20% difference between ._sptrptime and regexes though ._strptime is more accurate (doesn't parse invalid dates while regexes aren't so strict). I'm not yet sure which way is better, I'll get back to it sometime soon.

flash-gordon avatar Feb 10 '21 18:02 flash-gordon