sidekiq-iteration icon indicating copy to clipboard operation
sidekiq-iteration copied to clipboard

Pull into Sidekiq core?

Open mperham opened this issue 9 months ago • 4 comments

Hey @fatkodima, would you be interested in integrating this functionality into Sidekiq core for 7.3 or have me do it? I've had several customers report this gem as very useful for solving their problems with long-running jobs, making deployments quicker and safer, etc. I think it's a good pattern/API to encourage people to use.

mperham avatar Apr 25 '24 16:04 mperham

Hey! Wow, thats awesome to get this merged into sidekiq itself!

I will try to do that on this weekend (or next weekend) and see how it goes. Let me know if you have plans to release 7.3 sooner.

fatkodima avatar Apr 25 '24 16:04 fatkodima

I have a 7.3 milestone targeting a summer release. 7.2.3 will be out very soon.

mperham avatar Apr 25 '24 17:04 mperham

Wanted to ask, what API would you prefer?

  1. (my preference)
class MyJob
  include Sidekiq::Job
  include Sidekiq::Iteration
end

or something like 2.

class MyJob
  include Sidekiq::Job
  sidekiq_options iteration: true, ...
end

And what API would you prefer for throttling (https://github.com/fatkodima/sidekiq-iteration/blob/master/guides/throttling.md)? Currently it is configured via a top level call in the class' body.

fatkodima avatar May 02 '24 13:05 fatkodima

I'd probably go with:

class SomeJob
  include Sidekiq::Job
  include Sidekiq::Job::Iterable

  sidekiq_options iteration: { whatever: 123 }
end

Unlike Rails, I dislike top-level class methods like throttle_on as they can be hard to test and mock. I would prefer that be an instance method, server middleware provides an instance:

class ThrottleMiddleware
  include Sidekiq::ServerMiddleware

  def call(instance, job, queue)
    if instance.throttle_on?
      # do something
    end
  end
end

mperham avatar May 02 '24 19:05 mperham

As suggestion @mperham, I feel like the framework should be pulled into Sidekiq but not the concrete implementations.

AR can be suggested to be used as I reported on #9:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.in_batches(start: cursor) do |relation|
      yielder.yield(relation, relation.maximum(:id))
    end
  end
end

def each_iteration(relation)
  relation.update_all(...)
end

Or for batches:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.find_in_batches(start: cursor) do |batch|
      yielder.yield(batch, batch.last.id)
    end
  end
end

def each_iteration(batch)
  batch.each { ... }
end

Or for individual records:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.find_each(start: cursor) do |record|
      yielder.yield(record, record.id)
    end
  end
end

def each_iteration(record)
  record.update(...)
end

Feels like having the CSV, Array and AR may be too much, I'm not sure, just throwing ideas out here.

sobrinho avatar May 07 '24 20:05 sobrinho

Having optimized support for a few well known types/libraries is useful but we should have generic Enumerable support too.

mperham avatar May 07 '24 21:05 mperham