timere
timere copied to clipboard
Computing the next valid interval
Very nice library! Thank you for your work :)
I wanted to use Timere
to schedule cron-like tasks but I get stuck when I try to compute the next time the cron should run. For a task that should run every minute, I tried:
Timere.(resolve (after (Timedesc.now ()) &&& seconds [0]))
(* hangs *)
My understanding is that Timere
is only able to optimize the intersections of Pattern
, which seconds
is but since
is not. And so it resorts to computing a list of all the minutes from year 0, then doing an intersection via Seq
.
Knowing this, I can sort of cheat and do multiple resolve
with variations on years
, months
, etc to simulate since now
in a way that pattern can optimize... but it's not really convenient.
Did I miss an easier way of doing this?
In any case, may I suggest a different representation for Resolver.result_space
than lists or even seq? The main issue is that they can't skip any work, so it always costs O(N + M)
for intersections rather than something closer to O(min(N, M))
.
There is this lesser known operation on ordered collections called the successor query. It's very similar to mem
/find
but it returns a more useful result when the query fails:
val successor : Timere.t -> Timedesc.timestamp -> Timedesc.Interval.t option
The idea is that it should return the interval that contains the timestamp -- but if no such interval exists, then the very next interval that follows (or None
at the end when the timestamp is too big.) So it's exactly the operation I'm trying to do to find when my cron should run next!
Forgive my bad illustrations but in both situations below, successor
would return the [t0, t1)
interval:
timestamp
|
v
[-------) [t0--------t1) [------------)
timestamp
|
v
[-------) [t0--------t1) [-----)
(We can recover mem
/find
by checking if timestamp >= t0
)
The successor
operation can help a lot for computing intersection
, because it can skip forward and it also provides a uniform implicit representation of Timere.t
. In pseudo-code:
- The
union
of two successors is their minimum - We can produce a
Seq
out of a successor function by querying it atTimestamp.min_val
, getting back the first interval[t0, t1)
then iterating witht1
to search for the following one - The
intersection
is a tiny bit more complex, we have to search forward until both successors return overlapping intervals. Every failure to intersect moves us forward though.
For my situation seconds [0] &&& since now
, the intersection would go through the steps:
-
successor (seconds [0]) Timestamp.min_val
yields a first candidate[0, 1)
-
successor (since now) 0
returns[now, inf)
which is incompatible, so we should retry with the new lower-bound -
successor (seconds [0]) now
yields a[now + dt, now + dt + 1)
which is an overlapping interval with the previous[now, inf)
- and so we managed to skip all the intermediate minutes from year 0!
It's not perfect, as you can still build intervals that have no intersection and the successor is not going to help:
[---] [---] [---]
[---] [---] [---]
But in general it's a lot better when intersections do exist! (I used it in some terrible code that turned out to be fast enough thanks to the successor)
... And finally, the successor
representation could also open the door to operations like "every 7s" that don't fit in a Pattern.
Hm, it does take surprisingly long to resolve, and indeed I think your explanation is correct.
Though I vaguely recall having some handling for cases like this, let me check...
I'll need to read your design of result_space
a bit more closely before I can give a proper response.
Very nice library! Thank you for your work :)
: D
W.r.t. your proposal:
In any case, may I suggest a different representation for Resolver.result_space than lists or even seq? The main issue is that they can't skip any work, so it always costs O(N + M) for intersections rather than something closer to O(min(N, M)).
There are a lot more skipping happening already, and successor query is accommodated in one way or another internally already. Namely internally it'll propagate the bound to the pattern resolver for it to restart the pattern matching at a different interval.
The result space is intentionally a very rough overapproximated bound in any case, and should not be dense enough to be the cause of slowdown, see the approximation done for intervals in overapproximate_result_space_bottom_up
:
| Intervals (_, s) -> (
match s () with
| Seq.Nil -> t
| Seq.Cons ((start, _), _) ->
Intervals ([ (start, default_result_space_end_exc) ], s))
And there are static and dynamic adjustment to the actual search space for union and intersection.
For static adjustment pass, mainly see:
-
optimize_search_space
, which uses the following two internally-
overapproximate_result_space_bottom_up
-
restrict_result_space_top_down
-
For dynamic adjustments, some examples:
- within
aux_inter
let rec aux_inter' ~start (timeres : t list) =
let interval_batches = resolve ~start search_using_tz timeres in
let batch_for_sampling = collect_batch interval_batches in
if List.exists CCOption.is_none batch_for_sampling then Seq.empty
else
let batch_for_sampling =
CCList.map
(CCOption.get_exn_or "Unexpected None in batch_for_sampling")
batch_for_sampling
in
match batch_for_sampling with
| [] -> Seq.empty
| _ ->
let open Timedesc.Span in
let rightmost_interval =
batch_for_sampling
|> List.sort_uniq (fun x y -> Time.Interval'.compare y x)
|> List.hd
in
let rightmost_start = fst rightmost_interval in
let end_exc = rightmost_start + inter_slice_size in
(* we shift the start of our scope to rightmost_start *)
let timeres =
slice_result_space_multi ~start:rightmost_start timeres
in
(* refresh the interval batches if the gap is too large *)
let interval_batches =
if
rightmost_start - start
>= dynamic_search_space_adjustment_trigger_size
then resolve ~start:rightmost_start search_using_tz timeres
else slice_batches ~start:rightmost_start interval_batches
in
let intervals_up_to_end_exc =
interval_batches
|> CCList.to_seq
|> Seq.map (Intervals.Slice.slice ~skip_check:true ~end_exc)
|> Intervals.Inter.inter_multi_seq ~skip_check:true
in
fun () ->
Seq.Cons (intervals_up_to_end_exc, aux_inter' ~start:end_exc timeres)
where we do the skip based on a bundle of timere objects
- within
aux_union
let rec aux_union' (timeres : t Seq.t) (intervals : Interval'.t Seq.t) =
match intervals () with
| Seq.Nil -> Seq.empty
| Seq.Cons ((start, end_exc), rest) ->
let open Timedesc.Span in
let size = end_exc - start in
if size >= dynamic_search_space_adjustment_trigger_size then
let timeres = slice_result_space_multi_seq ~start:end_exc timeres in
let next_intervals =
resolve_and_merge timeres
|> OSeq.drop_while (fun x -> Time.Interval'.le x (start, end_exc))
in
fun () ->
Seq.Cons ((start, end_exc), aux_union' timeres next_intervals)
else fun () -> Seq.Cons ((start, end_exc), aux_union' timeres rest)
I added some debug statements and can confirm the result space bounds were propagated correctly, and minutes
seem to be very quick, so I am suspecting inefficiency in the pattern_resolver
.
I'll try to optimise it and see where that leads us.
Right now I suspect there are too many conversions from set to seq - I'll play around with pattern_resolver.ml
when I have time later.
Many thanks for raising the issue!
Hm...now I'm not very sure...
Aha...might have found the culprit...
Might be missing a call to slice_result_space_multi
in aux_inter'
Ooh right sorry, I missed the overapproximate
optimization! Indeed it calculates the right lowerbound (= now
), but then it searches from 2022/01/01 until today. Maybe it's because the pattern enumeration only uses the year of the lowerbound? (overall_search_start
is reset to year/01/01
below)
https://github.com/daypack-dev/timere/blob/1e272e24bac0a5f88eed962581e14375aab3cf65/src/pattern_resolver.ml#L508-L515
There's still a lot of code that I haven't read, so maybe it's not this! Thank you for looking into it so fast :)
Maybe it's because the pattern enumeration only uses the year of the lowerbound?
Oh yeah possibly! I'll need to remind myself of how all these work, it's been a year or more since I've last worked on them.
I think you're spot on.
Through dune utop src/
:
let x = Result.get_ok @@ Timere.(resolve
(after (Timedesc.make_exn ~year:2022 ~month:01 ~day:01 ~hour:01 ~minute:01 ~second:01 ()) &&& seconds [0]))
is significantly faster than
let x = Result.get_ok @@ Timere.(resolve
(after (Timedesc.make_exn ~year:2022 ~month:12 ~day:01 ~hour:01 ~minute:01 ~second:01 ()) &&& seconds [0]))
Inefficiency of pattern resolver has been addressed in #56
That was fast! Thanks a lot, it's much appreciated :)
I'm not impacted by the following, but after reading and thinking about the code, I believe there's an interesting edge case between the "search start time" optimization and lengthen
:
# Timere.(resolve_exn
(since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:0 ~second:0 ())
&&& lengthen (Timedesc.Span.make_small ~s:(30 * 60) ()) (minutes [0]))) ;;
[2022 Jan 01 00:00:00 +00:00:00, 2022 Jan 01 00:31:00 +00:00:00)
[2022 Jan 01 01:00:00 +00:00:00, 2022 Jan 01 01:31:00 +00:00:00)
[2022 Jan 01 02:00:00 +00:00:00, 2022 Jan 01 02:31:00 +00:00:00)
If the date is increased by a minute, then the first interval is lost even though it still overlaps with since
:
# Timere.(resolve_exn (* vvvvvvvvv was 0 *)
(since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:1 ~second:0 ())
&&& lengthen (Timedesc.Span.make_small ~s:(30 * 60) ()) (minutes [0]))) ;;
[2022 Jan 01 01:00:00 +00:00:00, 2022 Jan 01 01:31:00 +00:00:00)
[2022 Jan 01 02:00:00 +00:00:00, 2022 Jan 01 02:31:00 +00:00:00)
[2022 Jan 01 03:00:00 +00:00:00, 2022 Jan 01 03:31:00 +00:00:00)
My understanding is that we would expect the first interval to be truncated by a minute, as it's what we get from a similar expression:
# Timere.(resolve_exn
(since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:1 ~second:0 ())
&&& (pattern_intervals `Whole
(Points.make_exn ~lean_toward:`Earlier ~hour:0 ~minute:0 ~second:0 ())
(Points.make_exn ~lean_toward:`Earlier ~hour:0 ~minute:31 ~second:0 ())))) ;;
[2022 Jan 01 00:01:00 +00:00:00, 2022 Jan 01 00:31:00 +00:00:00) (* <--- *)
[2022 Jan 02 00:00:00 +00:00:00, 2022 Jan 02 00:31:00 +00:00:00)
[2022 Jan 03 00:00:00 +00:00:00, 2022 Jan 03 00:31:00 +00:00:00)
I believe the "search start time" for patterns has to subtract the intervals durations to catch these overlapping ranges (that starts before but end after)... Anyway I'm not impacted at all by this behavior, but I could not resist thinking about the cool algorithms involved in timere!
Thanks again for your quick support!
Yes indeed lengthen
has been a source of headache for the result/search space adjustment, also shift
.
(I usually leave discovery of these cases to fuzzer, but I lack the equipment for fuzzing right now sadly). But I think you found a case that wasn't caught by fuzzer previously in any case : D
Fixed now by #57 I believe
Yes it works! Thanks a ton, I'm a big fan of your reactivity and I'm very happy of how easy the cron scheduling turned out be thanks to timere! Have a nice weekend :)
Good to hear!
Right now somewhat waiting on timedesc 0.7.0 publish before submitting this version of timere, but if pointing to the git repo suffices for you then I'm not gonna try to hurry too much as I'm still busy with other stuff.
Yes the git pin is great! There's absolutely no need to rush, please take your time :)