timere Computing the next valid interval

Computing the next valid interval

Open art-w opened this issue 2 years ago • 13 comments

Very nice library! Thank you for your work :)

I wanted to use Timere to schedule cron-like tasks but I get stuck when I try to compute the next time the cron should run. For a task that should run every minute, I tried:

Timere.(resolve (after (Timedesc.now ()) &&& seconds [0]))
(* hangs *)

My understanding is that Timere is only able to optimize the intersections of Pattern, which seconds is but since is not. And so it resorts to computing a list of all the minutes from year 0, then doing an intersection via Seq.

Knowing this, I can sort of cheat and do multiple resolve with variations on years, months, etc to simulate since now in a way that pattern can optimize... but it's not really convenient.

Did I miss an easier way of doing this?

In any case, may I suggest a different representation for Resolver.result_space than lists or even seq? The main issue is that they can't skip any work, so it always costs O(N + M) for intersections rather than something closer to O(min(N, M)).

There is this lesser known operation on ordered collections called the successor query. It's very similar to mem/find but it returns a more useful result when the query fails:

val successor : Timere.t -> Timedesc.timestamp -> Timedesc.Interval.t option

The idea is that it should return the interval that contains the timestamp -- but if no such interval exists, then the very next interval that follows (or None at the end when the timestamp is too big.) So it's exactly the operation I'm trying to do to find when my cron should run next!

Forgive my bad illustrations but in both situations below, successor would return the [t0, t1) interval:

              timestamp
                 |
                 v
[-------)    [t0--------t1)    [------------)


              timestamp
                 |
                 v
[-------)                [t0--------t1)    [-----)

(We can recover mem/find by checking if timestamp >= t0)

The successor operation can help a lot for computing intersection, because it can skip forward and it also provides a uniform implicit representation of Timere.t. In pseudo-code:

The union of two successors is their minimum
We can produce a Seq out of a successor function by querying it at Timestamp.min_val, getting back the first interval [t0, t1) then iterating with t1 to search for the following one
The intersection is a tiny bit more complex, we have to search forward until both successors return overlapping intervals. Every failure to intersect moves us forward though.

For my situation seconds [0] &&& since now, the intersection would go through the steps:

successor (seconds [0]) Timestamp.min_val yields a first candidate [0, 1)
successor (since now) 0 returns [now, inf) which is incompatible, so we should retry with the new lower-bound
successor (seconds [0]) now yields a [now + dt, now + dt + 1) which is an overlapping interval with the previous [now, inf)
and so we managed to skip all the intermediate minutes from year 0!

It's not perfect, as you can still build intervals that have no intersection and the successor is not going to help:

[---]       [---]       [---]
      [---]       [---]       [---]

But in general it's a lot better when intersections do exist! (I used it in some terrible code that turned out to be fast enough thanks to the successor)

... And finally, the successor representation could also open the door to operations like "every 7s" that don't fit in a Pattern.

Mar 08 '22 18:03 art-w

Hm, it does take surprisingly long to resolve, and indeed I think your explanation is correct.

Though I vaguely recall having some handling for cases like this, let me check...

I'll need to read your design of result_spacea bit more closely before I can give a proper response.

Mar 09 '22 02:03 darrenldl

Very nice library! Thank you for your work :)

: D

W.r.t. your proposal:

In any case, may I suggest a different representation for Resolver.result_space than lists or even seq? The main issue is that they can't skip any work, so it always costs O(N + M) for intersections rather than something closer to O(min(N, M)).

There are a lot more skipping happening already, and successor query is accommodated in one way or another internally already. Namely internally it'll propagate the bound to the pattern resolver for it to restart the pattern matching at a different interval.

The result space is intentionally a very rough overapproximated bound in any case, and should not be dense enough to be the cause of slowdown, see the approximation done for intervals in overapproximate_result_space_bottom_up:

    | Intervals (_, s) -> (
        match s () with
        | Seq.Nil -> t
        | Seq.Cons ((start, _), _) ->
          Intervals ([ (start, default_result_space_end_exc) ], s))

And there are static and dynamic adjustment to the actual search space for union and intersection.

For static adjustment pass, mainly see:

optimize_search_space, which uses the following two internally
- overapproximate_result_space_bottom_up
- restrict_result_space_top_down

For dynamic adjustments, some examples:

within aux_inter

  let rec aux_inter' ~start (timeres : t list) =
    let interval_batches = resolve ~start search_using_tz timeres in
    let batch_for_sampling = collect_batch interval_batches in
    if List.exists CCOption.is_none batch_for_sampling then Seq.empty
    else
      let batch_for_sampling =
        CCList.map
          (CCOption.get_exn_or "Unexpected None in batch_for_sampling")
          batch_for_sampling
      in
      match batch_for_sampling with
      | [] -> Seq.empty
      | _ ->
        let open Timedesc.Span in
        let rightmost_interval =
          batch_for_sampling
          |> List.sort_uniq (fun x y -> Time.Interval'.compare y x)
          |> List.hd
        in
        let rightmost_start = fst rightmost_interval in
        let end_exc = rightmost_start + inter_slice_size in
        (* we shift the start of our scope to rightmost_start *)
        let timeres =
          slice_result_space_multi ~start:rightmost_start timeres
        in
        (* refresh the interval batches if the gap is too large *)
        let interval_batches =
          if
            rightmost_start - start
            >= dynamic_search_space_adjustment_trigger_size
          then resolve ~start:rightmost_start search_using_tz timeres
          else slice_batches ~start:rightmost_start interval_batches
        in
        let intervals_up_to_end_exc =
          interval_batches
          |> CCList.to_seq
          |> Seq.map (Intervals.Slice.slice ~skip_check:true ~end_exc)
          |> Intervals.Inter.inter_multi_seq ~skip_check:true
        in
        fun () ->
          Seq.Cons (intervals_up_to_end_exc, aux_inter' ~start:end_exc timeres)

where we do the skip based on a bundle of timere objects

within aux_union

  let rec aux_union' (timeres : t Seq.t) (intervals : Interval'.t Seq.t) =
    match intervals () with
    | Seq.Nil -> Seq.empty
    | Seq.Cons ((start, end_exc), rest) ->
      let open Timedesc.Span in
      let size = end_exc - start in
      if size >= dynamic_search_space_adjustment_trigger_size then
        let timeres = slice_result_space_multi_seq ~start:end_exc timeres in
        let next_intervals =
          resolve_and_merge timeres
          |> OSeq.drop_while (fun x -> Time.Interval'.le x (start, end_exc))
        in
        fun () ->
          Seq.Cons ((start, end_exc), aux_union' timeres next_intervals)
      else fun () -> Seq.Cons ((start, end_exc), aux_union' timeres rest)

I added some debug statements and can confirm the result space bounds were propagated correctly, and minutes seem to be very quick, so I am suspecting inefficiency in the pattern_resolver.

I'll try to optimise it and see where that leads us.

Mar 09 '22 03:03 darrenldl

Right now I suspect there are too many conversions from set to seq - I'll play around with pattern_resolver.ml when I have time later.

Many thanks for raising the issue!

Mar 09 '22 03:03 darrenldl

Hm...now I'm not very sure...

Aha...might have found the culprit...

Might be missing a call to slice_result_space_multi in aux_inter'

Mar 09 '22 04:03 darrenldl

Ooh right sorry, I missed the overapproximate optimization! Indeed it calculates the right lowerbound (= now), but then it searches from 2022/01/01 until today. Maybe it's because the pattern enumeration only uses the year of the lowerbound? (overall_search_start is reset to year/01/01 below)

https://github.com/daypack-dev/timere/blob/1e272e24bac0a5f88eed962581e14375aab3cf65/src/pattern_resolver.ml#L508-L515

There's still a lot of code that I haven't read, so maybe it's not this! Thank you for looking into it so fast :)

Mar 09 '22 18:03 art-w

Maybe it's because the pattern enumeration only uses the year of the lowerbound?

Oh yeah possibly! I'll need to remind myself of how all these work, it's been a year or more since I've last worked on them.

Mar 09 '22 23:03 darrenldl

I think you're spot on.

Through dune utop src/:

let x = Result.get_ok @@ Timere.(resolve
  (after (Timedesc.make_exn ~year:2022 ~month:01 ~day:01 ~hour:01 ~minute:01 ~second:01 ()) &&& seconds [0]))

is significantly faster than

let x = Result.get_ok @@ Timere.(resolve
  (after (Timedesc.make_exn ~year:2022 ~month:12 ~day:01 ~hour:01 ~minute:01 ~second:01 ()) &&& seconds [0]))

Mar 10 '22 03:03 darrenldl

Inefficiency of pattern resolver has been addressed in #56

Mar 11 '22 04:03 darrenldl

That was fast! Thanks a lot, it's much appreciated :)

I'm not impacted by the following, but after reading and thinking about the code, I believe there's an interesting edge case between the "search start time" optimization and lengthen:

# Timere.(resolve_exn
    (since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:0 ~second:0 ())
    &&& lengthen (Timedesc.Span.make_small ~s:(30 * 60) ()) (minutes [0]))) ;;
[2022 Jan 01 00:00:00 +00:00:00, 2022 Jan 01 00:31:00 +00:00:00)
[2022 Jan 01 01:00:00 +00:00:00, 2022 Jan 01 01:31:00 +00:00:00)
[2022 Jan 01 02:00:00 +00:00:00, 2022 Jan 01 02:31:00 +00:00:00)

If the date is increased by a minute, then the first interval is lost even though it still overlaps with since:

# Timere.(resolve_exn                                         (* vvvvvvvvv was 0 *)
    (since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:1 ~second:0 ())
    &&& lengthen (Timedesc.Span.make_small ~s:(30 * 60) ()) (minutes [0]))) ;;
[2022 Jan 01 01:00:00 +00:00:00, 2022 Jan 01 01:31:00 +00:00:00)
[2022 Jan 01 02:00:00 +00:00:00, 2022 Jan 01 02:31:00 +00:00:00)
[2022 Jan 01 03:00:00 +00:00:00, 2022 Jan 01 03:31:00 +00:00:00)

My understanding is that we would expect the first interval to be truncated by a minute, as it's what we get from a similar expression:

# Timere.(resolve_exn
    (since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:1 ~second:0 ())
    &&& (pattern_intervals `Whole
           (Points.make_exn ~lean_toward:`Earlier ~hour:0 ~minute:0  ~second:0 ())
           (Points.make_exn ~lean_toward:`Earlier ~hour:0 ~minute:31 ~second:0 ())))) ;;
[2022 Jan 01 00:01:00 +00:00:00, 2022 Jan 01 00:31:00 +00:00:00) (* <--- *)
[2022 Jan 02 00:00:00 +00:00:00, 2022 Jan 02 00:31:00 +00:00:00)
[2022 Jan 03 00:00:00 +00:00:00, 2022 Jan 03 00:31:00 +00:00:00)

I believe the "search start time" for patterns has to subtract the intervals durations to catch these overlapping ranges (that starts before but end after)... Anyway I'm not impacted at all by this behavior, but I could not resist thinking about the cool algorithms involved in timere!

Thanks again for your quick support!

Mar 11 '22 15:03 art-w

Yes indeed lengthen has been a source of headache for the result/search space adjustment, also shift.

(I usually leave discovery of these cases to fuzzer, but I lack the equipment for fuzzing right now sadly). But I think you found a case that wasn't caught by fuzzer previously in any case : D

Fixed now by #57 I believe

Mar 12 '22 03:03 darrenldl

Yes it works! Thanks a ton, I'm a big fan of your reactivity and I'm very happy of how easy the cron scheduling turned out be thanks to timere! Have a nice weekend :)

Mar 12 '22 12:03 art-w

Good to hear!

Right now somewhat waiting on timedesc 0.7.0 publish before submitting this version of timere, but if pointing to the git repo suffices for you then I'm not gonna try to hurry too much as I'm still busy with other stuff.

Mar 12 '22 12:03 darrenldl

Yes the git pin is great! There's absolutely no need to rush, please take your time :)

Mar 12 '22 13:03 art-w

timere timere copied to clipboard

Computing the next valid interval

timere
timere copied to clipboard