reactor icon indicating copy to clipboard operation
reactor copied to clipboard

Reactor "future-publishing-jobs"

Open winfred opened this issue 10 years ago • 17 comments

I'm moving a discussion started by @dashkb from a hired/hired issue so the discussion can live here in public. He has some really good ideas about how to reduce the complexities/opinions we recently thrust into reactor.

=== begin @dashkb quote ===

An idea I had while reading #995 that ended up getting kind of long and shouldn't jam up that PR.

All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections. -- David Wheeler

We want a way to publish an event when the event actually happens, which is going to be in the future. Some concerns we have are:

  • The time at which the event should fire, available to us now, may change, and so a job scheduled to fire an event at that moment may sometimes require rescheduling. Timestamps like this are typically stored conveniently in columns named things like start_at, end_at, expire_at, explode_at, etc.
  • We have some guard code we want to use at the last possible second to decide whether to fire the event at all.

I imagine this API:

class SomeModel
  # this model has `start_at` and `end_at` columns
  at_time :start do
    publish :whatever if things_are_still_cool?
  end

  at_time :end do
    if relation.state.acceptable?
      do_things!
    end
  end
end

This could be accomplished with a recurring/polling job which monitors all of the timestamp columns that have callbacks like these registered, and when their values move into the past, execute the block.

  • To reschedule a future-publishing job you just update the *_at column
  • None of your state checking code runs until the time
  • You don't even have to fire an event, or you can choose what events to fire conditionally, or whatever.

winfred avatar Dec 20 '14 18:12 winfred

I love the creativity with API design.

At the moment things like :start_at need to be more than columns, or we'd have to rewrite at least one event in hired/hired since it uses methods to calculate the execution time. (This isn't impossible, would just need a second column that gets maintained on update or something)

publishes :batch_prepared, at: :pre_batch_reminders_time, watch: :start_at

I'm totally not opposed to caching something like that in an extra column, especially because it simplifies the reactor codebase. We can remove 'publishing later' as a feature, which makes so much since because 'doing anything later' is really just a feature of Sidekiq.

This feels like a really solid direction to go in, and I love the David Wheeler quote. Spot on.

I'm curious what other reactor familiars think about this. @heythisisnate @schlende ?

winfred avatar Dec 20 '14 18:12 winfred

I really like this more and more as I think about it because we can remove another huge/annoying piece from reactor that was added for this feature. We had to introduce a requirement that all events have an actor in order support the if-lambda syntax for these future jobs, and I really didn't want to start forcing that kind of opinion from the reactor side. (well, there is the 'convention can be useful' argument, but i've specifically avoided forcing the event signature until now...) so this layer of indirection would let us undo that decision and make me smile a whole lot.

winfred avatar Dec 20 '14 18:12 winfred

So then where would this lovely DSL live? It sounds like either another gem entirely, or perhaps we could find some opt-in room in a new module like Reactor::LaterGator -> but the fact that I'm struggling come up with a project-related name for this leads me to believe it has so little to do with Reactor/PubSub that it shouldn't live in this gem.

winfred avatar Dec 20 '14 18:12 winfred

I think that the ability for reactor to publish jobs in the future (and execute state code at that time) is useful so it should be something we include.

I'm wondering -> how would we set up a polling job elegantly so that a user of reactor would not have to do a lot of app specific cron configuration to enable the job polling.

Also, it seems to me that we're suggesting that we examine every record in of (in Kyle's example) SomeModel on each iteration of this polling interval. This doesn't seem efficient to me at all. If you wanted to use this feature in a table with a million rows and your poll interval was every minute then you'd be fetching and examining those million rows every minute. Obviously 1 minute resolution may not be required but still...

Also, as winfred pointed out in our current implementation we are often not watching a db column. This is something we would need to solve.

I guess what I do like about Kyle's idea is that it de-couples us from Redis and dumps us into a more stable situation with state maintenance. We wouldn't have to worry about a redis wipe preventing all of our scheduled jobs from ever firing.

Still... I feel uneasy about this solution because of the configuration and efficiency issues. Do you guys have ideas about how we could address those issues?

schlende avatar Jan 05 '15 21:01 schlende

Ok... digesting this idea more

So performance may not be too big of a deal.

The db can probably give us a manageable list of records. I guess it's the other two issues. How would we set up a job to periodically poll these future jobs? && how would we handle non-db column cases?

Could our block look for a method OR a db column?

at_custom_time :start do
    publish :whatever if things_are_still_cool?
end

def custom_time
   start_at + 1.hour
end

Thoughts?

schlende avatar Jan 05 '15 21:01 schlende

The db can probably give us a manageable list of records.

Awesome, I was about to use way too many words to say just that.

How would we set up a job to periodically poll these future jobs?

Do you mean the mechanics of a polling job? The way we do it for the Heroku Autoscaler should be fine... I may have misunderstood the question.

how would we handle non-db column cases?

I think using a custom time block/method like you're suggesting might lead to the show-stopping performance problem you originally identified (having to execute some code for every record, or a too-large set of records). It definitely adds complexity, and may be tough to debug. Is it too much to ask of users of this feature that they calculate and store a time in advance? Do we have any current or upcoming use cases where it's not possible?

I think @wnadeau's idea of caching some value in an extra column (effectively turning every case into the simple "did this timestamp just move into the past?" case) is good. This could be supported with some magic.

dashkb avatar Jan 05 '15 22:01 dashkb

I guess we haven't solved for cases like this (silly) example:

at_custom_time :my_time { whatever }
def my_time
  start_at + random_number_of_seconds
end

i.e. where there's no way around running code to get the execution time.

but I think these cases can probably be rewritten to a "check and reschedule" form.

dashkb avatar Jan 05 '15 22:01 dashkb

at_custom_time :my_time { whatever }
def my_time
  start_at + random_number_of_seconds
end

hmmmmm, a cache-column does solve the case, we would just have to wire it up. (either let the consumers manually set the right after_commit hooks or provide a little DSL to make it clear) I was thinking the value returned by #my_time would just have to be calculated and set in a datetime column when the record is saved. The only weird gotcha someone might hit is if they are writing code in that method expecting it to be run any other time. They'll just have to mind the execution time. (but that's easy to document / make part of the contract)

re: @schlende

How would we set up a job to periodically poll these future jobs?

The at_custom_time :column_name DSL would collect all of the tables and columns that need to be queried by the poller. The poller just snags the records where the the column is in the past...

How does the poller know not to schedule the job a second time? (that it's been polled and passed to sidekiq)

A second polled_at column? (bleh)

How would we handle non-db column cases?

We wouldn't. All of the existing use cases for non-db columns (just the :auction_prepared in hired/hired) can be converted to db columns.

Despite some of the trade-offs (and remaining untied/loose ends), I'm still leaning toward this functionality being a gem separate from reactor because it would keep reactor quite simple. (performing a given ruby block in sidekiq at a previously calculated time has little to do with pub-sub.)

winfred avatar Jan 05 '15 22:01 winfred

How does the poller know not to schedule the job a second time? (that it's been polled and passed to sidekiq)

I was thinking something like:

  • poller knows last_poll_time
  • acts on jobs with *_at between now and last_poll_time

a gem separate from reactor

:+1:

dashkb avatar Jan 05 '15 22:01 dashkb

@dashkb I like the pattern we use in Heroku Autoscaler. Makes sense.

Agree with the between times idea too.

@wnadeau I agree it should be a separate gem... actually I'd be somewhat surprised if there wasn't something out there to do this already... hmmm... well couldn't find it in 5 mins so maybe it doesn't exist and we need to write it ;)

schlende avatar Jan 06 '15 03:01 schlende

https://www.pivotaltracker.com/story/show/85829676

winfred avatar Jan 10 '15 04:01 winfred

noticed we use clockwork in hired/hired now, looks pretty simple to run that polling job with this

winfred avatar Apr 13 '15 05:04 winfred

I started working on this today in-flight, in case anyone was wondering / looking for something to hack on. Battery died but I should be able to get something pushed tomorrow

winfred avatar Apr 20 '15 19:04 winfred

Also addressing some comments from @schlende at lunch regarding the common use-case of a datetime that is derived from an existing column. Specifically this kind of thing in the old paradigm:

  publishes :auction_prepared, at: :pre_auction_reminders_time, watch: :start_at

  def pre_auction_reminders_time
    start_at - 72.hours
  end

  def pre_auction_reminders_time_was
    (previous_changes[:start_at].try(:[], 0) || start_at) - 72.hours
  end

He was hoping to avoid having a column in the database for these, or at least avoid having to write a lot of active model hook / attr_writer overrides to store the derived value as a column. After thinking about this for a bit, I'm afraid we'd incur too much complexity by trying to not use a database column to query the records & track job-block (observation) state.

So instead I'll propose this extension to the DSL that will reduce the glue without giving up the simplicity of polling datetime columns.

at_time(:pre_auction_reminders_time, derive: -> { start_at - 72.hours }) do
  do_something
end

If the derive option is supplied, Chron will just run the lambda & write its return value to the pre_auction_reminders_time column before_save for you

winfred avatar Apr 24 '15 03:04 winfred

And now that I see this, I wonder how long until a new/unfamiliar dev tries to do something like this

at_time(:remind_of_incomplete_profile_at, derive: -> { created_at + 3.hours }) do
  ProfileReminder.perform_later
end

What's kind of fucked up / scary about this is that our current code review process would almost certainly assign someone that can't immediately see why this doesn't make sense and "may not be the right tool for the job".

In fact, I'm not even sure if that doesn't make sense as a way to trigger a reminder of sorts. Like is that better or worse than something like

on_event :user_created do
  ProfileReminder.perform_in(3.hours)
end

In this sense, I'm afraid of the variability we are introducing to "how we do things later". It's a nontrivial cost to pay in order to achieve a smaller, ActiveJob-dependent Reactor. We could diminish the cost if the code/docs/arch makes clear the use-cases for each way to "do something later".

Maybe the rule of thumb could be "prefer binding to events for business logic & fire system events with Chron when one doesn't exist at the time you need". This rule of thumb appears in line with the way we are actually using it in hired/hired#2097, the supplied Chron observation blocks simply publish a reactor event.

Yeah let's just be sure to keep up the expectation that is somewhat experimental. Though I am afraid the complexities we are considering here are expressed in both paradigms, except that Reactor::Publishable made strict the coupling between this DSL and the fact that you are publishing an event with it.

winfred avatar Apr 24 '15 03:04 winfred

prefer binding to events for business logic & fire system events with Chron when one doesn't exist at the time you need

Hmm... not sure I get it. It won't always make sense to fire an event from Chron and then bind to it, otherwise we'll end up with events like three_hours_after_developer_created (or maybe developer_profile_reminder_time) which exist only to get a job to fire at some time. The case in https://github.com/hired/hired/pull/2097 is the way it is because lots of things are interested in auction_started but we need to fire it at a time that might be changed by admins and so can't be enqueued for a specific time.

So my attempt at codifying:

  • Fire an event if it makes sense, like something has a variable start time (like auction_started)
  • If you're just doing one thing at a particular (variable) time, queue a job (like delivering pre auction reminders)
  • For static times, like created_at, don't use Chron

dashkb avatar Apr 24 '15 18:04 dashkb

otherwise we'll end up with events like three_hours_after_developer_created (or maybe developer_profile_reminder_time)

yeah that would be no good. will just have to see what shakes out at the margins.

winfred avatar Apr 27 '15 04:04 winfred