Support for interval functions, §6.33/§6.34/§8.20/§10.1
- [x] basic single-part Interval constructors
- [x] interval
+/-/*// - [x] 2 component constructors e.g
YEAR TO MONTH - [x] interval fractional seconds / (single field parsing)
- [x] xtdb/xtdb#2305
- [x] xtdb/xtdb#2306
- [x] xtdb/xtdb#2307
- [x] xtdb/xtdb#2308
- [ ] interval precision constraints
- [x] interval_literal plan extension
- [ ] interval fractional multiplication / div
- [ ] interval overflow tests (should get ex)
https://github.com/cockroachdb/cockroach/blob/ed695f004a9da72957a3bcfcfedcc9c2d389a257/pkg/sql/logictest/testdata/logic_test/interval
See https://github.com/xtdb/core2/blob/master/doc/adr/0034-interval.adoc for summary of current open questions
Was looking at equality / comparison, brought up the problem of differing representations and comparison.
https://github.com/xtdb/core2/blob/master/doc/adr/0034-interval.adoc
Thanks for the write-up, @wotbrew :pray:
With that in mind, I'd propose (RFC):
- generally speaking, option B. the Arrow data format is fundamental to our engine, so the closer we can keep to Arrow's semantics (so long as they don't differ too significantly), the simpler it'll be.
- looking at the interval units:
YEAR_MONTHseems to unify fine, so long as we make the decision that 12m = 1y - I'm personally happy with this one. Arrow themselves have explicitly made this call - they could have represented it with multiple numbers as with the other intervals, but chose not to.- I'd map SQL's
DAY_TIMEto Arrow'sMONTH_DAY_NANO(with 0 months), as the latter is a sub-type of the former (as opposed to Arrow'sDAY_TIME, which only supports down to millis).- this means we won't ever see Arrow
DAY_TIMEs except for externally provided Arrow.
- this means we won't ever see Arrow
- the conversion/comparison decisions then remain in Arrow space rather than SQL, especially given that SQL doesn't permit comparisons between
YEAR_MONTHandDAY_TIME. I'd use this as reason to say that 30d != 1m. looking at comparisons between Arrow interval units, I think these decisions lead to a consistent overall approach (although obv I may have missed a case):YEAR_MONTHis incomparable withDAY_TIME.YEAR_MONTH=MONTH_DAY_NANOonly if the day/nano parts are 0.DAY_TIME=MONTH_DAY_NANOonly if the month part is 0- this may lead to some edge cases - e.g. if you add one month and then 11 months, you might get a different date to if you added 11 and then one, or if you just added 12 months. I think this would be the same if you were to use Java
Periods, though. there're likely some other edge cases - if you can think of any, let's discuss them.
- I'd hence likely use our own logic for adding intervals to intervals/intervals to durations, rather than converting to Java objects, and only convert to Java objects if you needed to add intervals/durations to dates - I'm aware this would mean changing the interval representation in the EE to our own type box.
As I say, RFC - time for the inevitable 'what have I missed?' question!
the conversion/comparison decisions then remain in Arrow space rather than SQL, especially given that SQL doesn't permit comparisons between YEAR_MONTH and DAY_TIME. I'd use this as reason to say that 30d != 1m.
Just to clarify the problem of days and months exist because of arrow rather than SQL (e.g is P1M < P60D, is P1D < P400H). You don't need different interval types in comparisons to have this problem, just MONTH_DAY_NANO.
Thoughts? Have I misunderstood?
Another unrelated question,
I'd hence likely use our own logic for adding intervals to intervals/intervals to durations, rather than converting to Java objects, and only convert to Java objects if you needed to add intervals/durations to dates - I'm aware this would mean changing the interval representation in the EE to our own type box.
We currently use PeriodDuration as the one runtime representation of intervals, are you suggesting breaking out into different types for YEAR_MONTH / DAY_TIME or something else? Just making sure I understand.
You don't need different interval types in comparisons to have this problem, just MONTH_DAY_NANO. Have I misunderstood?
Nope, that's one I missed - thanks :thinking:
So, if we were to say 30d=1m, would we see it in normal SQL operation? i.e. if no SQL type can map into a MDN with non-zero months, and we disallowed comparing YM to DT and MDN, you couldn't arrive in this situation...? If so, I'd be less opposed to 30d=1m, I think - particularly if you're only going to run into it when the user has explicitly asked to compare something with months to something with days.
We currently use PeriodDuration as the one runtime representation of intervals, are you suggesting breaking out into different types for YEAR_MONTH / DAY_TIME or something else? Just making sure I understand.
I think so, yep - my reasoning was that a lot of the arithmetic would then be 'just' component-wise arithmetic, IIUC. We'd want to use java.time for Date operations, but we may 'just' be able to use things like ZDT/.plusMonths and friends for those.
if no SQL type can map into a MDN with non-zero months
In XTDB right now you can derive MONTH_DAY_NANO with arithmetic, e.g do 1 YEAR + 1 DAY. Let's imagine we disallow that, then external arrow would be the only way we see these mixed intervals. I would think (perhaps naively) such composition is valuable, working around it would be a big pain.
Option C perhaps needs another look. java.time.Period is just not Comparable, and nobody complains. But you can add them together freely, equality is defined structurally on the fields.
my reasoning was that a lot of the arithmetic would then be 'just' component-wise arithmetic, IIUC
Sure, PeriodDuration gives us the fields we need but we could invent our own type. At the time everyone seemed very hesitant to public class XTDBInterval.
One issue I can see with simple field arith may be overflow, e.g you can promote up certain fields to avoid overflow, not sure if we want to do this - this is something java.time does, e.g:
if ((secondsToAdd | nanosToAdd) == 0) {
return this;
}
long epochSec = Math.addExact(seconds, secondsToAdd);
epochSec = Math.addExact(epochSec, nanosToAdd / NANOS_PER_SECOND);
nanosToAdd = nanosToAdd % NANOS_PER_SECOND;
long nanoAdjustment = nanos + nanosToAdd; // safe int+NANOS_PER_SECOND
return ofSeconds(epochSec, nanoAdjustment);
Option C perhaps needs another look.
java.time.Periodis just not Comparable, and nobody complains. But you can add them together freely, equality is defined structurally on the fields.
I wasn't keen on option C because of the creation of the extra ArrowType - IMO this should be a last resort, as we'd be the only Arrow user that knew about it, and (unlike our other extension types), other users falling back to the underlying storage type isn't really viable.
Good point about Period disallowing comparisons, though - if this were allowed in the SQL spec, it'd be tempting?
Sorry, I meant option C in regards to simply disallowing comparison - perhaps some middle ground. All arrow, field-wise equality - disabled comparison. If you want to compare as a user, you have to extract fields you are interested in and do arithmetic where you decide the behaviour.
This breaks spec AFAICT.