xtdb icon indicating copy to clipboard operation
xtdb copied to clipboard

Support for interval functions, §6.33/§6.34/§8.20/§10.1

Open jarohen opened this issue 3 years ago • 8 comments

  • [x] basic single-part Interval constructors
  • [x] interval + / - / * / /
  • [x] 2 component constructors e.gYEAR TO MONTH
  • [x] interval fractional seconds / (single field parsing)
  • [x] xtdb/xtdb#2305
  • [x] xtdb/xtdb#2306
  • [x] xtdb/xtdb#2307
  • [x] xtdb/xtdb#2308
  • [ ] interval precision constraints
  • [x] interval_literal plan extension
  • [ ] interval fractional multiplication / div
  • [ ] interval overflow tests (should get ex)

https://github.com/cockroachdb/cockroach/blob/ed695f004a9da72957a3bcfcfedcc9c2d389a257/pkg/sql/logictest/testdata/logic_test/interval

jarohen avatar Apr 06 '22 14:04 jarohen

See https://github.com/xtdb/core2/blob/master/doc/adr/0034-interval.adoc for summary of current open questions

jarohen avatar May 16 '22 13:05 jarohen

Was looking at equality / comparison, brought up the problem of differing representations and comparison.

https://github.com/xtdb/core2/blob/master/doc/adr/0034-interval.adoc

wotbrew avatar Jun 08 '22 09:06 wotbrew

Thanks for the write-up, @wotbrew :pray:

With that in mind, I'd propose (RFC):

  • generally speaking, option B. the Arrow data format is fundamental to our engine, so the closer we can keep to Arrow's semantics (so long as they don't differ too significantly), the simpler it'll be.
  • looking at the interval units:
    • YEAR_MONTH seems to unify fine, so long as we make the decision that 12m = 1y - I'm personally happy with this one. Arrow themselves have explicitly made this call - they could have represented it with multiple numbers as with the other intervals, but chose not to.
    • I'd map SQL's DAY_TIME to Arrow's MONTH_DAY_NANO (with 0 months), as the latter is a sub-type of the former (as opposed to Arrow's DAY_TIME, which only supports down to millis).
      • this means we won't ever see Arrow DAY_TIMEs except for externally provided Arrow.
  • the conversion/comparison decisions then remain in Arrow space rather than SQL, especially given that SQL doesn't permit comparisons between YEAR_MONTH and DAY_TIME. I'd use this as reason to say that 30d != 1m. looking at comparisons between Arrow interval units, I think these decisions lead to a consistent overall approach (although obv I may have missed a case):
    • YEAR_MONTH is incomparable with DAY_TIME.
    • YEAR_MONTH = MONTH_DAY_NANO only if the day/nano parts are 0.
    • DAY_TIME = MONTH_DAY_NANO only if the month part is 0
    • this may lead to some edge cases - e.g. if you add one month and then 11 months, you might get a different date to if you added 11 and then one, or if you just added 12 months. I think this would be the same if you were to use Java Periods, though. there're likely some other edge cases - if you can think of any, let's discuss them.
  • I'd hence likely use our own logic for adding intervals to intervals/intervals to durations, rather than converting to Java objects, and only convert to Java objects if you needed to add intervals/durations to dates - I'm aware this would mean changing the interval representation in the EE to our own type box.

As I say, RFC - time for the inevitable 'what have I missed?' question!

jarohen avatar Jun 10 '22 13:06 jarohen

the conversion/comparison decisions then remain in Arrow space rather than SQL, especially given that SQL doesn't permit comparisons between YEAR_MONTH and DAY_TIME. I'd use this as reason to say that 30d != 1m.

Just to clarify the problem of days and months exist because of arrow rather than SQL (e.g is P1M < P60D, is P1D < P400H). You don't need different interval types in comparisons to have this problem, just MONTH_DAY_NANO.

Thoughts? Have I misunderstood?


Another unrelated question,

I'd hence likely use our own logic for adding intervals to intervals/intervals to durations, rather than converting to Java objects, and only convert to Java objects if you needed to add intervals/durations to dates - I'm aware this would mean changing the interval representation in the EE to our own type box.

We currently use PeriodDuration as the one runtime representation of intervals, are you suggesting breaking out into different types for YEAR_MONTH / DAY_TIME or something else? Just making sure I understand.

wotbrew avatar Jun 10 '22 15:06 wotbrew

You don't need different interval types in comparisons to have this problem, just MONTH_DAY_NANO. Have I misunderstood?

Nope, that's one I missed - thanks :thinking:

So, if we were to say 30d=1m, would we see it in normal SQL operation? i.e. if no SQL type can map into a MDN with non-zero months, and we disallowed comparing YM to DT and MDN, you couldn't arrive in this situation...? If so, I'd be less opposed to 30d=1m, I think - particularly if you're only going to run into it when the user has explicitly asked to compare something with months to something with days.

We currently use PeriodDuration as the one runtime representation of intervals, are you suggesting breaking out into different types for YEAR_MONTH / DAY_TIME or something else? Just making sure I understand.

I think so, yep - my reasoning was that a lot of the arithmetic would then be 'just' component-wise arithmetic, IIUC. We'd want to use java.time for Date operations, but we may 'just' be able to use things like ZDT/.plusMonths and friends for those.

jarohen avatar Jun 13 '22 09:06 jarohen

if no SQL type can map into a MDN with non-zero months

In XTDB right now you can derive MONTH_DAY_NANO with arithmetic, e.g do 1 YEAR + 1 DAY. Let's imagine we disallow that, then external arrow would be the only way we see these mixed intervals. I would think (perhaps naively) such composition is valuable, working around it would be a big pain.

Option C perhaps needs another look. java.time.Period is just not Comparable, and nobody complains. But you can add them together freely, equality is defined structurally on the fields.

my reasoning was that a lot of the arithmetic would then be 'just' component-wise arithmetic, IIUC

Sure, PeriodDuration gives us the fields we need but we could invent our own type. At the time everyone seemed very hesitant to public class XTDBInterval.

One issue I can see with simple field arith may be overflow, e.g you can promote up certain fields to avoid overflow, not sure if we want to do this - this is something java.time does, e.g:

if ((secondsToAdd | nanosToAdd) == 0) {
            return this;
}
long epochSec = Math.addExact(seconds, secondsToAdd);
epochSec = Math.addExact(epochSec, nanosToAdd / NANOS_PER_SECOND);
nanosToAdd = nanosToAdd % NANOS_PER_SECOND;
long nanoAdjustment = nanos + nanosToAdd;  // safe int+NANOS_PER_SECOND
return ofSeconds(epochSec, nanoAdjustment);

wotbrew avatar Jun 13 '22 09:06 wotbrew

Option C perhaps needs another look. java.time.Period is just not Comparable, and nobody complains. But you can add them together freely, equality is defined structurally on the fields.

I wasn't keen on option C because of the creation of the extra ArrowType - IMO this should be a last resort, as we'd be the only Arrow user that knew about it, and (unlike our other extension types), other users falling back to the underlying storage type isn't really viable.

Good point about Period disallowing comparisons, though - if this were allowed in the SQL spec, it'd be tempting?

jarohen avatar Jun 13 '22 14:06 jarohen

Sorry, I meant option C in regards to simply disallowing comparison - perhaps some middle ground. All arrow, field-wise equality - disabled comparison. If you want to compare as a user, you have to extract fields you are interested in and do arithmetic where you decide the behaviour.

This breaks spec AFAICT.

wotbrew avatar Jun 14 '22 08:06 wotbrew