Advanced Cycling
Cylc offers two ways of writing recurrences.
- Implicit cycling with the use of ISO8601 date-times with reduced precision (e.g
T00) - Cycling with ISO8601 recurring time intervals.
The upcoming ISO8601-2 specification adds extensions (to the upcoming ISO8601-1 revision) which may enable the handling of some users more exotic requirements.
(1) Implicit Cycling
ISO8601 specifies the hyphen character can be used as a selector e.g W-1 (the first day of the week), T-00 (the first minute of the hour). One might be forgiven for mistaking the hyphen for a wildcard in the latter example. ISO8601-2 implements a proper wildcard type way of writing date-times with the "unspecified" character X:
200X # 2000, 2001, 2002, ..., 2009
2000-X2 # 2000-02, 2000-12
2000-X1-X1 # 2000-01-01, 2000-01-11, 2000-01-21, 2000-01-31, 2000-11-01, 2000-11-11,
# 2000-11-21
(2) Recurring Time Intervals
The original ISO8601 specification supports four options for specifying a recurring time interval:
[recurrences/]start/end
[recurrences/]duration
[recurrences/]start/duration
[recurrences/]duration/end
This minimal syntax has some limitations e.g:
- There is no
start/duration/endsyntax - Irregular cycling is difficult / messy / impossible (e.g. the payday problem)
The upcoming ISO8601-2 specification adds the ability to specify an optional rule to the recurring time interval i.e:
[recurrences/]start/end[/rule]
[recurrences/]duration[/rule]
[recurrences/]start/duration[/rule]
[recurrences/]duration/end[/rule]
The rule is a colon separated list of key=value pairs. These rules can be used to realise quite eccentric recurrences, I've taken a stab at a few:
Tim Whitcomb's (highly) irregular cycling problem:
R/1999/2015/FREQ=DY;BYMO=7;BYDA=MO,TU,SA,SU;BYHO=12
# Run at 12Z every monday, tuesday, saturday, sunday but only in July
# between the years 1999 and 2015.
run for every Monday in August from 1999 to 2019
DTSTART=19990101T000000Z
FREQ=WEEKLY;BYDAY=MO;UNTIL=20190101T000000Z
The payday problem:
R/P0Y/FREQ=MO;BYDA=-1FR
# This defines a null time interval which repeats on the last
# Friday of each month.
# Note that from a cylc perspective the time interval
# becomes meaningless in this case (see below).
Jin Lee's third Tuesday of the month problem:
R/P0Y/FREQ=MO;BYDA=+3TU;
# This defines a null time interval which repeats on the third Tuesday
# of every month.
# Note that from a cylc perspective the time interval
# becomes meaningless in this case (see below).
The start/duration/end problem:
R/2000/2001/FREQ=DA;BYHR=0,6,12,18
# Run at T00, T06, T12, T18 every day between 2000 and 2001.
The last day of the month problem:
R/P0Y/FREQ=MO;BYMD=-1
# This defines a null time interval which repeats on the last
# day of the month (-2 the penultimate day etc).
# Note that from a cylc perspective the time interval
# becomes meaningless in this case (see below).
Every hour on the first of the month:
i.e. 01T??00.
RRULE:FREQ=HOURLY;INTERVAL=1;BYMONTHDAY=1;BYMINUTE=0;BYSECOND=0
Unfortunately the syntax feels like a departure from ISO8601, it's difficult to read, lengthy and somewhat complex. It also has the added complication that in cylc the duration component is used a define the interval in-between runs as opposed to the duration of each run itself meaning that when a duration is defined in combination with a repeat rule is becomes meaningless (hence the P0Y in the examples above).
(3) Even more syntax
To add a little extra complication here are examples of some of the exotic syntax extensions defined in ISO8601-2:
[2000, 2001] # Either 2000 OR 2001
{2000, 2001} # Both 2000 AND 2001
{1750..2000} # All the years 1750 to 2000 inclusive
{2000, 2000-01} # Mixed precision is permitted
*/1066 # Time interval ending in the year 1066
..1066 # Before or during the year 1066
1969-22 # The summer of 69
2017-37 # The first quarter of 2017
And for completeness here is some of the really exotic syntax from ISO8601-2:
1066-10-14T08? # 1066-10-14 at 8:00 ish
1066~ # Approximately 1066
1066S2 # Some year between 1000 and 1100 estimated to be 1066
y14E6 # The year 14000000
A lot of these do look useful - I don't know whether it would be worth implementing everything listed (although it would be nice for a sense of completion and being fully ISO8601-2 compliant).
1969-22 # The summer of 69 2017-37 # The first quarter of 2017
are these the special codes you mentioned for seasons/quarters etc..? (they look a bit confusing at first..)
Is there anything else not covered by this latest standard that users ask for?
I don't know whether it would be worth implementing everything listed
We definitely wouldn't want to implement everything, [[[2000?/P1Y]]].
are these the special codes you mentioned for seasons/quarters etc..?
Yes, they take the place of the month digits and assume values from 21 to 39. Options available are:
- Hemisphere independent seasons
- Northern hemisphere seasons
- Southern hemisphere seasons
- Quarters
- Quadrimesters
On the surface these may sound potentially useful. Unfortunately I cannot find any information on where the divides are supposed to be and due to different communities varied usage I don't imagine they'll be helpful.
Is there anything else not covered by this latest standard that users ask for?
Simplicity perhaps. What with ISO8601 and cylc's extensions to it (min(), !, $, ^, R1), cycling syntax is becoming rather complicated.
The new recurring time interval rules should cover most cases, I can't find any user requests it doesn't satisfy. ISO8601-2 actually borrows its recurrence rules from the iCalendar standard.
See also from the wiki ISO8601 Vs RRULE.
We could add RRULE cycling as an alternative cycling mode. As the RRule library would return a generator we might need different framework to patch this into the current cycling approach. This sits more closely with cycle drivers.
Another cycling use case I recently encountered.
Subtly change the order of tasks in different seasons e.g.
foo => bar => bazduring the winter months andbar => foo => bazduring the summer months.
The way the user had worked around this is to construct the following with Jinja2:
[[[1201T00Z, 1202T00Z, 1203T00Z, 1204T00Z, ...]]]
graph = # winter graph
[[[0601T00Z, 0602T00Z, 0603T00Z, 0604T00Z, ...]]]
graph = # summer graph
Obviously this is horrendous Cylc abuse as you end up with 182-183 recurrences.
ISO8601 isn't really able to handle this one, there is the R<N>/<start>/<stop> syntax which we do support but sadly doesn't work with truncated dates, otherwise one could do:
[[[R92/0601T00Z/0831T00Z]]]
graph = # summer graph
Though this solution would not work for winter months due to the February problem.
Of course this is trivial in RRULE:
FREQ=DAILY;INTERVAL=1;BYMONTH=5,6,7
Another cycling problem recently encountered:
run a task on the first Tuesday in July, October, January, April
There isn't really an ISO8601:2005 solution, the RRULE solution is this:
RRULE:FREQ=MONTHLY;INTERVAL=1;WKST=MO;BYDAY=TU;BYMONTH=1,4,7,10;BYSETPOS=1;BYHOUR=0;BYMINUTE=0;BYSECOND=0
To give a quick breakdown of that:
# every month
RRULE:FREQ=MONTHLY;INTERVAL=1;WKST=MO;
# but only in January, April, July and October
BYMONTH=1,4,7,10;
# on the first tuesday of the month
BYDAY=TU;BYSETPOS=1;
# at T00:00:00
BYHOUR=0;BYMINUTE=0;BYSECOND=0
Having more powerful cycling syntax is tempting, but it brings about another problem: how to write dependencies between tasks cycling on different sequences. This is already problematic with ISO8601:2005:
[[[P4W]]]
graph = foo
[[[P1M]]]
graph = bar
[[[???]]]
graph = "foo[???] => bar"
The problem with the ISO standard is that it only considers individual sequences, but lacks a way to represent relationships between such sequences.
The problem with the ISO standard is that it only considers individual sequences, but lacks a way to represent relationships between such sequences.
It's tricky, with irregular cycling (which is what RRULE opens up) using durations for inter-cycle dependency just doesn't work, however, the integer cycling approach (-P1) should suffice for most use cases.
Issue #2452 proposes bringing the integer inter-cycle offset to datetime cycling suites, and also, a possible -RN syntax for specifying the previous occurrence on the current recurrence.
[[[P4W]]]
graph = foo
[[[P1M]]]
graph = bar
[[[RRULE:FREQ=MONTHLY;INTERVAL=1;WKST=MO;BYDAY=TU;BYMONTH=1,4,7,10;BYSETPOS=1;BYHOUR=0;BYMINUTE=0;BYSECOND=0]]]
# on this strange recurrence `bar` will depend on the previous instance of `foo`
graph = "foo[-P1] => bar"
Just had a go at implementing RRule support and found it was surprisingly easy:
https://github.com/cylc/cylc-flow/compare/master...oliver-sanders:cylc-flow:rrule?expand=1
- Implemented as an extension to the ISO8601 cycler in order to allow RRULE to be mixed in with ISO8601 sequences.
- Uses the
dateutilpackage for rrule support (which appears to be in the stack already). - In order to make RRule strings kosher for Parsec ingestion I needed to
s/=/-/; s/,/_/g.
Here's an example which runs an extremely awkward RRule which is impossible to implement in ISO8601:
# flow.cylc
[scheduler]
allow implicit tasks = True
[scheduling]
initial cycle point = now
[[graph]]
# every 15 minutes at 0'past the minute BUT
# * only every other hour
# * on Tuesdays and Thursdays
# * in January and July
RRULE:FREQ-HOURLY;COUNT-30;INTERVAL-2;WKST-MO;BYDAY-TU_TH;BYMONTH-1_6;BYMINUTE-0_15_30_45;BYSECOND-0 = """
foo
"""
Shockingly this appears to run just fine.
I expect the POC interfaces I've knocked together are buggy but they appear to be enough to prove the concept works.
Another nasty one:
https://cylc.discourse.group/t/unique-scheduling-specification/859/5