sidekiq-bulk icon indicating copy to clipboard operation
sidekiq-bulk copied to clipboard

Add push_bulk_in & push_bulk_in!

Open martinbottanek opened this issue 6 years ago • 2 comments

Allows you to specify when your jobs should run.

Internally, this works in the same way as Sidekiq’s perform_in.

martinbottanek avatar Oct 01 '19 11:10 martinbottanek

Bump. This looks helpful.

originalhat avatar Feb 23 '23 00:02 originalhat

My apologies for such a long delay in responding to this PR.

This library is in a somewhat interesting state, given Sidekiq changes since I last did any work on this repo. I thought I'd capture my thoughts here for context. Not everything is directly relevant to the change you're proposing, but once you start pulling on the thread...

The summary is that I'm unlikely to merge this PR as-is because there are some required refactors to catch up with Sidekiq changes. (There would also need to be some tests for the new methods! 😅)

The gist of the situation:

  • This repo pre-dates a few Sidekiq features that have since been added.
  • Sidekiq has added a perform_bulk method on job classes. That makes it harder to see why this library exists at all in 2023, save for a difference in API. (Being able to pass a block, for example.)
  • Later versions of Sidekiq makes Sidekiq::Client batch-aware, which undermines an assumption originally baked into this library's implementation.
  • It's possible that there is a simpler API that incorporates at + batching than what can be done with Sidekiq::Client today. This gem could provide that API.

More detail below.


Sidekiq v6.3.0 introduced perform_bulk (https://github.com/sidekiq/sidekiq/pull/5042) on job classes, with batched pushes. So with v6.3.0 or later, this works:

FooJob.perform_bulk([[1, "a"], [2, "b"], [3, "c"]], batch_size: 2)

Moreover, Sidekiq v7.1.0 included a change (https://github.com/sidekiq/sidekiq/commit/1b2c3e8d0ba77447f12a4fb5035cf238ced2d23c) which made Sidekiq::Client.push_bulk batch-aware, so this works:

Sidekiq::Client.push_bulk("class" => FooJob, "args" => [[1, "a"], [2, "b"], [3, "c"]], :batch_size => 2)

Without a batch_size specified, a default of 1000 is chosen.

(Later, the job-level perform_bulk was modified in https://github.com/sidekiq/sidekiq/commit/c1607198815d68f60d138010907dd3426d6521bb to use Client.push_bulk.)

And, of course, Sidekiq v6.0.1 made Client.push_bulk aware of at (https://github.com/sidekiq/sidekiq/pull/4243), which this PR relies on, so that this works:

Sidekiq::Client.push_bulk("class" => FooJob, "args" => [[1, "a"], [2, "b"], [3, "c"]], "at" => [1, 2, 3])

So, now what?

  1. If this library is used with Sidekiq v7.1.0 or later, the push_bulk method (from this library) is superseded by Sidekiq::Client.push_bulk's batching into groups of 1000. (This library will split jobs into groups of 10,000, but the underlying Client.push_bulk call will then group into chunks of 1000.)
  2. Similarly, this library'spush_bulk! no longer has the same semantics on v7.1.0 or later. It originally meant "push everything", but it will now batch by 1000.
  3. Since Sidekiq now provides perform_bulk on job classes, this library needs a stronger justification for its own existence. It originally provided what I saw as a fairly valuable operation that required a bunch of boilerplate, but Sidekiq's own perform_bulk is no more complicated than push_bulk. (It doesn't feel necessary to me to make this library exist as a replacement for perform_bulk for anyone on Sidekiq pre-v6.3.0.)
  4. Assuming it continues to exist, this library should be reworked to account for Sidekiq::Client.push_bulk's batch_size option.

With all that in place, at-aware bulk pushing has a fairly simple implementation as a generic version of this:

Sidekiq::Client.push_bulk(
  "class" => FooJob,
  "args" => [[1, "a"], [2, "b"], [3, "c"]],
  # An array of `Numeric`s, or `Time`s, or whatever else Sidekiq would accept.
  "at" => at_values,
  :batch_size => 2
)

A key question is whether this gem can provide a meaningful improvement over just using that approach. It might be able to.

aprescott avatar Oct 09 '23 21:10 aprescott