activerecord-jdbc-adapter icon indicating copy to clipboard operation
activerecord-jdbc-adapter copied to clipboard

improve AR timestamp column Date/Time parsing

Open kares opened this issue 11 years ago • 7 comments

if doable http://stackoverflow.com/questions/22100728/rails-query-slow-but-fast-in-pgadmin

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1437224-improve-ar-timestamp-column-date-time-parsing?utm_campaign=plugin&utm_content=tracker%2F136963&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F136963&utm_medium=issues&utm_source=github).

kares avatar Mar 07 '14 07:03 kares

The timestamp parsing does seem slower under JRuby and the JDBC adapter, and I don't have an answer to that, but I'm also seeing a difference in adapter behavior that is possibly contributing to the slowness in some circumstances. I think it's possibly related to the changes made in #349. That change seems to force the string to timestamp conversion to happen as soon as the data is selected from the database. However, it looks like Rails's default adapter defers this conversion process until that specific attribute is actually read from the object. In some situations this may be leading to unnecessary slowness due to the forced up-front conversions.

Here's a simple test script that logs each time string_to_time is called on the column object. And here are the results.

As you can see, string_to_time gets called immediately under activerecord-jdbc-adapter with Rails 4 unless the arjdbc.datetime.raw property is set to true. With arjdbc.datetime.raw=true then is seems to behave like the default Rails adapter under Rails 4.

However, this leads to my confusion, since if this is needed because of the custom select issue in #349, I guess I'm not seeing the original issue. Even with arjdbc.datetime.raw=true, custom selects seem to still result in Time objects being returned (as the output from this test script also shows). And with arjdbc.datetime.raw forced to true, I was still able to get the test suites to pass against sqlite and postgresql. So this leads to my question of why arjdbc.datetime.raw is needed and is set to false by default under Rails 4 (but it's totally possible I'm just not testing the right things or not understanding the original issue).

Getting back to the timestamp slowness, deferring the conversion like the default adapter seems to dramatically speed things up if timestamp fields are selected, but never accessed. If the timestamp fields on each result will be eventually accessed, then this obviously just delays the slowness, so this isn't exactly a fix, but given Rails's default select * happiness, this might be a worthwhile optimization for cases when you don't actually ever access the timestamp columns (and this also seems to match Rails's default behavior).

I've put together a broader set of benchmarks to hopefully help with this: https://github.com/GUI/activerecord-jdbc-adapter-benchmarks But basically the gist of it regarding this timestamp conversion is that when fetching 5,000 records and only selecting a single datetime column (repeated 30 times), here's how long each environment tool:

  • MRI + Rails 4
    • Fetch records only: 3.51 seconds
    • Fetch records and access datetime attribute: 6.74 seconds
  • JRuby + Rails 4
    • Fetch records only: 15.82 seconds
    • Fetch records and access datetime attribute: 16.32 seconds
  • JRuby + Rails 4 + arjdbc.datetime.raw=true
    • Fetch records only: 3.10 seconds
    • Fetch records and access datetime attribute: 16.60 seconds

So while arjdbc.datetime.raw=true gets it on parity with Rails's default adapter for the initial fetch, when the deferred conversions happen, it's still a fair bit slower.

GUI avatar Mar 15 '14 21:03 GUI

excellent ... thanks you @GUI ... I wish I could jump run into experimenting with optimizing the code :)

kares avatar Mar 16 '14 08:03 kares

I think I've uncovered part of it: https://github.com/jruby/jruby/issues/1563 Basically, due to a seeming issue in JRuby, ActiveRecord's fast_string_to_time never gets used, so JRuby is always forced to use the slower fallback_string_to_time.

So I see a few options here: wait to see if this is fixable in JRuby, see if we can get Rails to change their implementation to workaround this to_r issue, or override that method in this adapter to provide the jruby-ified workaround (which basically just consist of: https://gist.github.com/GUI/9584445). Any opinions? Overriding it in this adapter might allow for the fix to be pushed out more quickly to all activerecord-jdbc users (regardless of ActiveRecord or JRuby version). But since this also seems like an issue elsewhere, I'm not sure if you want to deal with adding such workarounds to this code-base. If you do want the workaround for this code base, just let me know, and I'd be happy to submit a pull request.

With a fixed version of fast_string_to_time in place, it helps speed things up considerably, and gets it pretty much inline with MRI, if not slightly faster. That same test above results in:

  • JRuby + Rails 4
    • Fetch records only: 4.79 seconds
    • Fetch records and access datetime attribute: 5.38 seconds
  • JRuby + Rails 4 + arjdbc.datetime.raw=true
    • Fetch records only: 3.15 seconds
    • Fetch records and access datetime attribute: 5.86 seconds

However, while experimenting with other things in the code, I was able to improve the performance a bit more by simply bypassing Rails's default string_to_time dance. It struck me as somewhat odd that the adapter first fetches the data as a jdbc java timestamp, then converts that into a string, and then converts that string into a ruby time object. Rails's default adapters seem to do this same string conversion dance, so it makes sense to follow that, but I'm wondering if it's entirely necessary for the JDBC adapter. It seems more efficient (and faster in my initial rough tests) to simply convert from something like a jdbc timestamp object directly into a ruby time object, bypassing all the string conversion and parsing.

I'm just guessing here, but it seems like the string conversion stuff might have stemmed from Rails standardizing how things came from multiple adapters. However, since JDBC already seems to normalize these data type, it seems like extra work to convert these to intermediary strings. Do you have any opinions on bypassing this normal timestamp->string->time dance inside this jdbc adapter and just going from java timestamp->ruby time? Does that seem okay, or does that deviate from activerecord's default behavior too much? Or if it would help to see a fuller code example of how it might work like this, I can try to put that together at some point.

Thanks!

GUI avatar Mar 16 '14 15:03 GUI

@GUI have "suffered" a fair amount of time making times compatible in AR-JDBC ... and I have to say it's still "hacky" (but at least many things such as custom-selects are working as opposed to 1.2.x) ... I do not recall all the details it's probably easy to find out looking at history. Certainly it's not optimal and I've been experimenting with returning Ruby wrapped time values (instead of strings) but than reverted to not using (I do not recall why at this point)... e.g. https://github.com/jruby/activerecord-jdbc-adapter/blob/b385ef/src/java/arjdbc/jdbc/RubyJdbcConnection.java#L1698-L1714 shows the madness (introduced at https://github.com/jruby/activerecord-jdbc-adapter/commit/3e91b47b0915e4bc7a1c528b3283fc30f7681665)

so please sure by all means, experiment with "fixing" those in terms of performance (note that PG's JDBC connection had overriden some of the JdbcConnection' methods e.g. https://github.com/jruby/activerecord-jdbc-adapter/blob/b385ef/src/java/arjdbc/postgresql/PostgreSQLRubyJdbcConnection.java#L442)

p.s. apologize for possible future long replies, but I really need to concentrate on a few other things currently ... it's really a matter of try and see anyways ( until all the tests stop failing :)

kares avatar Mar 16 '14 15:03 kares

@GUI, have you had a chance to rerun the benchmarks now that jruby 1.7.12 is released with the fix for https://github.com/jruby/jruby/issues/1563 and are you seeing the expected speed increase?

atambo avatar Apr 26 '14 18:04 atambo

@atambo I've looked into it ... run @GUI's benchmark comparing 1.7.11 and 1.7.12, there's about a 5x performance increase with queries that load datetime / timestamp attributes https://gist.github.com/kares/45b97dcc68a78c589a35 :

30x SELECT *, 5000 records                                  41.060000   0.490000  41.550000 ( 39.162000)
30x SELECT *, 5000 records - get attributes                 48.660000   0.550000  49.210000 ( 46.121000)
30x SELECT sample_datetime, 5000 records                    15.020000   0.050000  15.070000 ( 15.221000)
30x SELECT sample_datetime, 5000 records - get attributes   15.690000   0.050000  15.740000 ( 15.860000)
30x SELECT sample_timestamp, 5000 records                   15.040000   0.050000  15.090000 ( 15.241000)
30x SELECT sample_timestamp, 5000 records - get attributes  15.690000   0.050000  15.740000 ( 15.806000)
30x SELECT *, 5000 records                                  15.020000   0.450000  15.470000 ( 14.613000)
30x SELECT *, 5000 records - get attributes                 21.040000   0.420000  21.460000 ( 20.519000)
30x SELECT sample_datetime, 5000 records                     3.100000   0.020000   3.120000 (  3.374000)
30x SELECT sample_datetime, 5000 records - get attributes    3.750000   0.030000   3.780000 (  4.019000)
30x SELECT sample_timestamp, 5000 records                    3.090000   0.020000   3.110000 (  3.329000)
30x SELECT sample_timestamp, 5000 records - get attributes   3.730000   0.020000   3.750000 (  4.034000)

kares avatar Apr 27 '14 15:04 kares

@kares awesome!

atambo avatar Apr 27 '14 16:04 atambo