cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Advanced Cycling

Open oliver-sanders opened this issue 8 years ago • 11 comments

Cylc offers two ways of writing recurrences.

  1. Implicit cycling with the use of ISO8601 date-times with reduced precision (e.g T00)
  2. Cycling with ISO8601 recurring time intervals.

The upcoming ISO8601-2 specification adds extensions (to the upcoming ISO8601-1 revision) which may enable the handling of some users more exotic requirements.

(1) Implicit Cycling

ISO8601 specifies the hyphen character can be used as a selector e.g W-1 (the first day of the week), T-00 (the first minute of the hour). One might be forgiven for mistaking the hyphen for a wildcard in the latter example. ISO8601-2 implements a proper wildcard type way of writing date-times with the "unspecified" character X:

200X        # 2000, 2001, 2002, ..., 2009
2000-X2     # 2000-02, 2000-12
2000-X1-X1  # 2000-01-01, 2000-01-11, 2000-01-21, 2000-01-31, 2000-11-01, 2000-11-11,
            # 2000-11-21

(2) Recurring Time Intervals

The original ISO8601 specification supports four options for specifying a recurring time interval:

[recurrences/]start/end
[recurrences/]duration
[recurrences/]start/duration
[recurrences/]duration/end

This minimal syntax has some limitations e.g:

  • There is no start/duration/end syntax
  • Irregular cycling is difficult / messy / impossible (e.g. the payday problem)

The upcoming ISO8601-2 specification adds the ability to specify an optional rule to the recurring time interval i.e:

[recurrences/]start/end[/rule]
[recurrences/]duration[/rule]
[recurrences/]start/duration[/rule]
[recurrences/]duration/end[/rule]

The rule is a colon separated list of key=value pairs. These rules can be used to realise quite eccentric recurrences, I've taken a stab at a few:

Tim Whitcomb's (highly) irregular cycling problem:

R/1999/2015/FREQ=DY;BYMO=7;BYDA=MO,TU,SA,SU;BYHO=12
# Run at 12Z every monday, tuesday, saturday, sunday but only in July
# between the years 1999 and 2015.

run for every Monday in August from 1999 to 2019

DTSTART=19990101T000000Z
FREQ=WEEKLY;BYDAY=MO;UNTIL=20190101T000000Z

The payday problem:

R/P0Y/FREQ=MO;BYDA=-1FR
# This defines a null time interval which repeats on the last
# Friday of each month.

# Note that from a cylc perspective the time interval
# becomes meaningless in this case (see below).

Jin Lee's third Tuesday of the month problem:

R/P0Y/FREQ=MO;BYDA=+3TU;
# This defines a null time interval which repeats on the third Tuesday
# of every month.

# Note that from a cylc perspective the time interval
# becomes meaningless in this case (see below).

The start/duration/end problem:

R/2000/2001/FREQ=DA;BYHR=0,6,12,18
# Run at T00, T06, T12, T18 every day between 2000 and 2001.

The last day of the month problem:

R/P0Y/FREQ=MO;BYMD=-1
# This defines a null time interval which repeats on the last
# day of the month (-2 the penultimate day etc).

# Note that from a cylc perspective the time interval
# becomes meaningless in this case (see below).

Every hour on the first of the month:

i.e. 01T??00.

RRULE:FREQ=HOURLY;INTERVAL=1;BYMONTHDAY=1;BYMINUTE=0;BYSECOND=0

Unfortunately the syntax feels like a departure from ISO8601, it's difficult to read, lengthy and somewhat complex. It also has the added complication that in cylc the duration component is used a define the interval in-between runs as opposed to the duration of each run itself meaning that when a duration is defined in combination with a repeat rule is becomes meaningless (hence the P0Y in the examples above).

(3) Even more syntax

To add a little extra complication here are examples of some of the exotic syntax extensions defined in ISO8601-2:

[2000, 2001]     # Either 2000 OR 2001
{2000, 2001}     # Both 2000 AND 2001
{1750..2000}     # All the years 1750 to 2000 inclusive
{2000, 2000-01}  # Mixed precision is permitted
*/1066           # Time interval ending in the year 1066
..1066           # Before or during the year 1066
1969-22          # The summer of 69
2017-37          # The first quarter of 2017

And for completeness here is some of the really exotic syntax from ISO8601-2:

1066-10-14T08?   # 1066-10-14 at 8:00 ish
1066~            # Approximately 1066
1066S2           # Some year between 1000 and 1100 estimated to be 1066
y14E6            # The year 14000000

oliver-sanders avatar Aug 15 '17 11:08 oliver-sanders

A lot of these do look useful - I don't know whether it would be worth implementing everything listed (although it would be nice for a sense of completion and being fully ISO8601-2 compliant).

1969-22 # The summer of 69 2017-37 # The first quarter of 2017

are these the special codes you mentioned for seasons/quarters etc..? (they look a bit confusing at first..)

Is there anything else not covered by this latest standard that users ask for?

dvalters avatar Aug 15 '17 11:08 dvalters

I don't know whether it would be worth implementing everything listed

We definitely wouldn't want to implement everything, [[[2000?/P1Y]]].

are these the special codes you mentioned for seasons/quarters etc..?

Yes, they take the place of the month digits and assume values from 21 to 39. Options available are:

  • Hemisphere independent seasons
  • Northern hemisphere seasons
  • Southern hemisphere seasons
  • Quarters
  • Quadrimesters

On the surface these may sound potentially useful. Unfortunately I cannot find any information on where the divides are supposed to be and due to different communities varied usage I don't imagine they'll be helpful.

Is there anything else not covered by this latest standard that users ask for?

Simplicity perhaps. What with ISO8601 and cylc's extensions to it (min(), !, $, ^, R1), cycling syntax is becoming rather complicated.

The new recurring time interval rules should cover most cases, I can't find any user requests it doesn't satisfy. ISO8601-2 actually borrows its recurrence rules from the iCalendar standard.

oliver-sanders avatar Aug 15 '17 12:08 oliver-sanders

See also from the wiki ISO8601 Vs RRULE.

We could add RRULE cycling as an alternative cycling mode. As the RRule library would return a generator we might need different framework to patch this into the current cycling approach. This sits more closely with cycle drivers.

oliver-sanders avatar Nov 19 '18 16:11 oliver-sanders

Another cycling use case I recently encountered.

Subtly change the order of tasks in different seasons e.g. foo => bar => baz during the winter months and bar => foo => baz during the summer months.

The way the user had worked around this is to construct the following with Jinja2:

[[[1201T00Z, 1202T00Z, 1203T00Z, 1204T00Z, ...]]]
    graph = # winter graph
[[[0601T00Z, 0602T00Z, 0603T00Z, 0604T00Z, ...]]]
    graph = # summer graph

Obviously this is horrendous Cylc abuse as you end up with 182-183 recurrences.

ISO8601 isn't really able to handle this one, there is the R<N>/<start>/<stop> syntax which we do support but sadly doesn't work with truncated dates, otherwise one could do:

[[[R92/0601T00Z/0831T00Z]]]
    graph = # summer graph

Though this solution would not work for winter months due to the February problem.

Of course this is trivial in RRULE:

FREQ=DAILY;INTERVAL=1;BYMONTH=5,6,7

oliver-sanders avatar Jun 05 '19 08:06 oliver-sanders

Another cycling problem recently encountered:

run a task on the first Tuesday in July, October, January, April

There isn't really an ISO8601:2005 solution, the RRULE solution is this:

RRULE:FREQ=MONTHLY;INTERVAL=1;WKST=MO;BYDAY=TU;BYMONTH=1,4,7,10;BYSETPOS=1;BYHOUR=0;BYMINUTE=0;BYSECOND=0

To give a quick breakdown of that:

# every month
RRULE:FREQ=MONTHLY;INTERVAL=1;WKST=MO;
# but only in January, April, July and October
BYMONTH=1,4,7,10;
# on the first tuesday of the month
BYDAY=TU;BYSETPOS=1;
# at T00:00:00
BYHOUR=0;BYMINUTE=0;BYSECOND=0

oliver-sanders avatar Jul 08 '20 09:07 oliver-sanders

Having more powerful cycling syntax is tempting, but it brings about another problem: how to write dependencies between tasks cycling on different sequences. This is already problematic with ISO8601:2005:

[[[P4W]]]
graph = foo

[[[P1M]]]
graph = bar

[[[???]]]
graph = "foo[???] => bar"

The problem with the ISO standard is that it only considers individual sequences, but lacks a way to represent relationships between such sequences.

TomekTrzeciak avatar Jul 08 '20 17:07 TomekTrzeciak

The problem with the ISO standard is that it only considers individual sequences, but lacks a way to represent relationships between such sequences.

It's tricky, with irregular cycling (which is what RRULE opens up) using durations for inter-cycle dependency just doesn't work, however, the integer cycling approach (-P1) should suffice for most use cases.

Issue #2452 proposes bringing the integer inter-cycle offset to datetime cycling suites, and also, a possible -RN syntax for specifying the previous occurrence on the current recurrence.

[[[P4W]]]
graph = foo

[[[P1M]]]
graph = bar

[[[RRULE:FREQ=MONTHLY;INTERVAL=1;WKST=MO;BYDAY=TU;BYMONTH=1,4,7,10;BYSETPOS=1;BYHOUR=0;BYMINUTE=0;BYSECOND=0]]]
# on this strange recurrence `bar` will depend on the previous instance of `foo`
graph = "foo[-P1] => bar"

oliver-sanders avatar Jul 08 '20 17:07 oliver-sanders

Just had a go at implementing RRule support and found it was surprisingly easy:

https://github.com/cylc/cylc-flow/compare/master...oliver-sanders:cylc-flow:rrule?expand=1

  • Implemented as an extension to the ISO8601 cycler in order to allow RRULE to be mixed in with ISO8601 sequences.
  • Uses the dateutil package for rrule support (which appears to be in the stack already).
  • In order to make RRule strings kosher for Parsec ingestion I needed to s/=/-/; s/,/_/g.

Here's an example which runs an extremely awkward RRule which is impossible to implement in ISO8601:

# flow.cylc
[scheduler]                                                                        
    allow implicit tasks = True                                                    
                                                                                   
[scheduling]                                                                       
    initial cycle point = now                                                      
    [[graph]]                                                                      
        # every 15 minutes at 0'past the minute BUT    
        # * only every other hour    
        # * on Tuesdays and Thursdays    
        # * in January and July    
        RRULE:FREQ-HOURLY;COUNT-30;INTERVAL-2;WKST-MO;BYDAY-TU_TH;BYMONTH-1_6;BYMINUTE-0_15_30_45;BYSECOND-0 = """
            foo    
        """ 

Shockingly this appears to run just fine.

I expect the POC interfaces I've knocked together are buggy but they appear to be enough to prove the concept works.

oliver-sanders avatar Oct 07 '22 10:10 oliver-sanders

Another nasty one:

https://cylc.discourse.group/t/unique-scheduling-specification/859/5

oliver-sanders avatar Jan 19 '24 10:01 oliver-sanders