zed icon indicating copy to clipboard operation
zed copied to clipboard

Support month unit and proper years in duration

Open chrismo opened this issue 1 year ago • 4 comments

Primitive Values in the 1.18 docs show that "month" is not included unit in the type:

Valid time units are "ns" (nanosecond), "us" (microsecond), "ms" (millisecond), "s" (second), "m" (minute), "h" (hour), "d" (day), "w" (7 days), and "y" (365 days). Note that each of these time units accurately represents its calendar value, except for the "y" unit, which does not reflect leap years and so forth. Instead, "y" is defined as the number of nanoseconds in 365 days.

I presume this is because dealing with the varying lengths of months is some work, same as dealing with leap years. I have a use-case working moving dates where having month support would be handy. Instead, I'll drop back to the shell and use GNU date for what I need.

chrismo avatar Oct 13 '24 13:10 chrismo

@chrismo: Indeed, it seems to be one of those problems with no single pure solution, so users are stuck figuring out what compromises they're willing to make and settling for a "good enough" approach that suits their needs. Debates on this topic can be found attached to many of the "duration" libraries for various programming languages, and this StackOverflow post seems to do a decent job of walking through much of the topic.

If we can add enhancements that help get closer to "good enough", I expect use case details from users like you or anyone else watching this issue would help a lot.

Just doing some general riffing along the lines of how I see other folks coping, some just take the simplistic approach just using an approximate month measured in days, e.g., if trying to find approximately "one month" out from a given timestamp:

$ echo '2018-03-24T17:15:21.411148Z' | zq -z 'this + 30d' -
2018-04-23T17:15:21.411148Z

Or if you want to be more accurate, maybe use a helper constant:

$ echo '2018-03-24T17:15:21.411148Z' | zq -z 'const MonthIsh = 365.25d / 12 this + MonthIsh' -
2018-04-24T03:45:21.411148Z

And I feel like I should point out that for working with calendar months, the strftime function might come in handy. So for instance if you were trying to do aggregations "by month", you might start from the bucket function and get stuck because its duration-type argument span can't take a month-based parameter. But you can make a string-based year+month calendar bucketing with strftime, such as if I wanted to count the timestamps in the attached times.zng.gz by calendar month.

$ zq -z 'head 3' times.zng.gz
{ts:2015-04-13T09:34:44.73264Z}
{ts:2015-04-13T09:34:44.727039Z}
{ts:2015-04-13T09:34:44.727022Z}

$ zq -z 'YearMonth:=strftime("%Y-%m", ts) | count() by YearMonth' times.zng.gz
{YearMonth:"2015-04",count:169479(uint64)}
{YearMonth:"2015-03",count:326269(uint64)}
...

But these are just food for thought absent specific use cases. Folks watching this issue should please speak up with their details. Thanks!

philrz avatar Oct 14 '24 20:10 philrz

My use case is working with data like a recurring monthly activity, and I wanted to be able to move a due date forward one month and have it "just work" so that Aug 31st could be moved forward to Sep 30 (last day of the month). The case of going from Sep 30 to Oct 30 vs Oct 31 (last day of month) - there's no way I can think to have a reasonable default other than Oct 30 with just simple duration addition ... but, that would be fine with me if I could do + 1mo or somesuch.

I also know how complicated this stuff can get, and y'all may decide it's out of scope for zed, which would be understandable to me.

chrismo avatar Oct 15 '24 14:10 chrismo

@chrismo: When you wrote the last comment I made a note-to-self to see if I could come up with a user-defined function that would address your specific use case, and thanks to being bored on a recent long flight, I had the opportunity to do just that. I'll paste it below and walk through validation that it works. Please do check my work though if you intend to use it in production.

It takes an input value in the time type and returns a last-day-of-the-month after offsetting the date by a number of months. Since the super data model doesn't currently offer a standalone date type, I currently have it returning a string value of the year-month-day format, e.g., 2018-03-24, which can be turned back into a time type easily just wrapping it in a time() cast.

Here's the function definition in last-day-of-month.spq:

func pad(num): (
  (num < 10) ? "0" + string(num) : string(num)
)

func last_day_of_month(ts, offset): (
  (
    over ts
    | yield split(string(ts), 'T')[0]
    | yield grok("%{YEAR:year}-%{MONTHNUM:monthnum}-%{MONTHDAY}", this)
    | year := int64(year),monthnum:=int64(monthnum) - 1
    | start := (year * 12) + monthnum
    | end := start + offset
    | end_year := end / 12
    | end_month := (end % 12) + 1
    | end_day := (end_month in [4, 6, 9, 11]) ? 30 :
                 (end_month in [1, 3, 5, 7, 8, 10, 12]) ? 31 :
                 ((end_year % 4) == 0) and not ((end_year % 100) == 0 and not ((end_year % 400) == 0)) ? 29 : 28
    | yield string(end_year) + '-' + pad(end_month) + '-' + pad(end_day)
  )
)

Here's time values in each month of the year in input file day-in-each-month.jsup:

2018-01-24T17:15:21.411148Z
2018-02-24T17:15:21.411148Z
2018-03-24T17:15:21.411148Z
2018-04-24T17:15:21.411148Z
2018-05-24T17:15:21.411148Z
2018-06-24T17:15:21.411148Z
2018-07-24T17:15:21.411148Z
2018-08-24T17:15:21.411148Z
2018-09-24T17:15:21.411148Z
2018-10-24T17:15:21.411148Z
2018-11-24T17:15:21.411148Z
2018-12-24T17:15:21.411148Z

For the use case you seemed to be describing of wanting to "move a due date forward one month" and get back the appropriate last day of the adjusted month, here it is doing that on each of those inputs by giving the function an offset of 1.

$ super -version
Version: v1.18.0-259-gc8a5fc2b

$ super -I last-day-of-month.spq -c 'yield {orig: this, offset_date: last_day_of_month(this, 1), offset_time: time(last_day_of_month(this, 1))}' day-in-each-month.jsup 
{orig:2018-01-24T17:15:21.411148Z,offset_date:"2018-02-28",offset_time:2018-02-28T00:00:00Z}
{orig:2018-02-24T17:15:21.411148Z,offset_date:"2018-03-31",offset_time:2018-03-31T00:00:00Z}
{orig:2018-03-24T17:15:21.411148Z,offset_date:"2018-04-30",offset_time:2018-04-30T00:00:00Z}
{orig:2018-04-24T17:15:21.411148Z,offset_date:"2018-05-31",offset_time:2018-05-31T00:00:00Z}
{orig:2018-05-24T17:15:21.411148Z,offset_date:"2018-06-30",offset_time:2018-06-30T00:00:00Z}
{orig:2018-06-24T17:15:21.411148Z,offset_date:"2018-07-31",offset_time:2018-07-31T00:00:00Z}
{orig:2018-07-24T17:15:21.411148Z,offset_date:"2018-08-31",offset_time:2018-08-31T00:00:00Z}
{orig:2018-08-24T17:15:21.411148Z,offset_date:"2018-09-30",offset_time:2018-09-30T00:00:00Z}
{orig:2018-09-24T17:15:21.411148Z,offset_date:"2018-10-31",offset_time:2018-10-31T00:00:00Z}
{orig:2018-10-24T17:15:21.411148Z,offset_date:"2018-11-30",offset_time:2018-11-30T00:00:00Z}
{orig:2018-11-24T17:15:21.411148Z,offset_date:"2018-12-31",offset_time:2018-12-31T00:00:00Z}
{orig:2018-12-24T17:15:21.411148Z,offset_date:"2019-01-31",offset_time:2019-01-31T00:00:00Z}

Of course, if you just wanted the last day of the given month, use an offset of 0.

$ super -I last-day-of-month.spq -c 'yield {orig: this, offset_date: last_day_of_month(this, 0), offset_time: time(last_day_of_month(this, 0))}' day-in-each-month.jsup 
{orig:2018-01-24T17:15:21.411148Z,offset_date:"2018-01-31",offset_time:2018-01-31T00:00:00Z}
{orig:2018-02-24T17:15:21.411148Z,offset_date:"2018-02-28",offset_time:2018-02-28T00:00:00Z}
{orig:2018-03-24T17:15:21.411148Z,offset_date:"2018-03-31",offset_time:2018-03-31T00:00:00Z}
{orig:2018-04-24T17:15:21.411148Z,offset_date:"2018-04-30",offset_time:2018-04-30T00:00:00Z}
{orig:2018-05-24T17:15:21.411148Z,offset_date:"2018-05-31",offset_time:2018-05-31T00:00:00Z}
{orig:2018-06-24T17:15:21.411148Z,offset_date:"2018-06-30",offset_time:2018-06-30T00:00:00Z}
{orig:2018-07-24T17:15:21.411148Z,offset_date:"2018-07-31",offset_time:2018-07-31T00:00:00Z}
{orig:2018-08-24T17:15:21.411148Z,offset_date:"2018-08-31",offset_time:2018-08-31T00:00:00Z}
{orig:2018-09-24T17:15:21.411148Z,offset_date:"2018-09-30",offset_time:2018-09-30T00:00:00Z}
{orig:2018-10-24T17:15:21.411148Z,offset_date:"2018-10-31",offset_time:2018-10-31T00:00:00Z}
{orig:2018-11-24T17:15:21.411148Z,offset_date:"2018-11-30",offset_time:2018-11-30T00:00:00Z}
{orig:2018-12-24T17:15:21.411148Z,offset_date:"2018-12-31",offset_time:2018-12-31T00:00:00Z}

You can go backwards with negative offset values.

$ super -I last-day-of-month.spq -c 'yield {orig: this, offset_date: last_day_of_month(this, -1), offset_time: time(last_day_of_month(this, -1))}' day-in-each-month.jsup 
{orig:2018-01-24T17:15:21.411148Z,offset_date:"2017-12-31",offset_time:2017-12-31T00:00:00Z}
{orig:2018-02-24T17:15:21.411148Z,offset_date:"2018-01-31",offset_time:2018-01-31T00:00:00Z}
{orig:2018-03-24T17:15:21.411148Z,offset_date:"2018-02-28",offset_time:2018-02-28T00:00:00Z}
{orig:2018-04-24T17:15:21.411148Z,offset_date:"2018-03-31",offset_time:2018-03-31T00:00:00Z}
{orig:2018-05-24T17:15:21.411148Z,offset_date:"2018-04-30",offset_time:2018-04-30T00:00:00Z}
{orig:2018-06-24T17:15:21.411148Z,offset_date:"2018-05-31",offset_time:2018-05-31T00:00:00Z}
{orig:2018-07-24T17:15:21.411148Z,offset_date:"2018-06-30",offset_time:2018-06-30T00:00:00Z}
{orig:2018-08-24T17:15:21.411148Z,offset_date:"2018-07-31",offset_time:2018-07-31T00:00:00Z}
{orig:2018-09-24T17:15:21.411148Z,offset_date:"2018-08-31",offset_time:2018-08-31T00:00:00Z}
{orig:2018-10-24T17:15:21.411148Z,offset_date:"2018-09-30",offset_time:2018-09-30T00:00:00Z}
{orig:2018-11-24T17:15:21.411148Z,offset_date:"2018-10-31",offset_time:2018-10-31T00:00:00Z}
{orig:2018-12-24T17:15:21.411148Z,offset_date:"2018-11-30",offset_time:2018-11-30T00:00:00Z}

And of course offset values of multiple months work too.

Finally, there's special handling in there for the leap year cases with the appropriate exceptions, which will probably all never get exercised, but I knew if I didn't put it in there it'd bite me somehow. 😄 Input values in leap-centric.jsup:

// Basic rule: If a year is divisible by 4, it's usually a leap year. 
// Exception for century years: If a year is divisible by 100, it's only a leap year if it's also divisible by 400.
2024-01-13T09:34:44.73264Z // Basic. Divisible by 4. A leap year.
2000-01-13T09:34:44.73264Z // Exception case. Divisible by both 100 and 400, so it is a leap year.
1900-01-13T09:34:44.73264Z // Exception case. Divisible by 100 but not by 400, so it is not a leap year.

Exercising that code path:

$ super -I last-day-of-month.spq -c 'yield {orig: this, offset_date: last_day_of_month(this, 1), offset_time: time(last_day_of_month(this, 1))}' leap-centric.jsup 
{orig:2024-01-13T09:34:44.73264Z,offset_date:"2024-02-29",offset_time:2024-02-29T00:00:00Z}
{orig:2000-01-13T09:34:44.73264Z,offset_date:"2000-02-29",offset_time:2000-02-29T00:00:00Z}
{orig:1900-01-13T09:34:44.73264Z,offset_date:"1900-02-28",offset_time:1900-02-28T00:00:00Z}

Let me know if that hits the spot.

philrz avatar Feb 05 '25 03:02 philrz

Let me know if that hits the spot.

And then some :) That looks fantastic, thx!

chrismo avatar Feb 05 '25 14:02 chrismo