activejob-status icon indicating copy to clipboard operation
activejob-status copied to clipboard

Implement job batches

Open matteeyah opened this issue 1 year ago • 7 comments

Summary

Implements job batches. This is used to group multiple job and "calculate" a single status for the group of jobs.

This is useful when you want to schedule multiple batches of work for a single resource in parallel, but you want to know when all of them complete or if one fails.

It's a really simple implementation that doesn't add any additional complexity, just adds a way to calculate a single status for a group of statuses.

Background

We're looking to switch from Sidekiq to SolidQueue, but we use Sidekiq batches. This gem provides a lot of the underlying mechanics, but not the actual batches. We'd love to be able to complement SolidQueue with this gem to achieve equivalent functionality we had in Sidekiq.

matteeyah avatar Jan 14 '24 00:01 matteeyah

@inkstak Sorry for not asking first, but I took a stab at implementing batches for ActiveJob::Status. I'd appreciate if you took a look when able.

matteeyah avatar Jan 14 '24 00:01 matteeyah

Thanks, I’m really interested, but I won’t be available this week. I come back to you next week.

inkstak avatar Jan 15 '24 12:01 inkstak

@inkstak Any updates on this? I added some docs on callbacks as well.

matteeyah avatar Jan 25 '24 14:01 matteeyah

As I said, I'm really interested : Two years ago we left ActiveJob behind for a while and focused on purely Sidekiq workers for better performances. We run millions of jobs per month, most of them in complex Sidekiq Batches, and some of these batches includes thousands of jobs.

Since few months, I'm considering replacing Sidekiq Batches by AcidicJob, moving back to ActiveJob, and why not replacing Sidekiq by SolidQueue.

I haven't thought about using ActiveJob::Status to monitor batches so your idea make me thinking a lot. I don't think it'll match with high volumes but it could be an interesting alternative for small volumes.

Few things to improve:

1 - I'd prefer to pass an array of jobs as an argument if we had to pass dozens of jobs:

ActiveJob::Status::Batch.new(array_of_jobs)

2 - We could benefit from read_multi to read all statuses in only one roundtrip.

This can be achieved by implementing the method ActiveJob::Status::Storage.read_multi(array_of_jobs) and then fetch their statuses in ActiveJob::Status::Batches on demand:

module ActiveJob
  module Status
    class Batch
      def initialize(jobs)
        @jobs = jobs
        @storage = ActiveJob::Status::Storage.new
      end

      def statuses
        read.pluck(:status)
      end

      # ....

      private

      def read
        @storage.read_multi(@jobs)
      end

We could then implement other methods to get an average progress of the batch, the raised exceptions, etc..

inkstak avatar Jan 25 '24 21:01 inkstak

I don't think it'll match with high volumes but it could be an interesting alternative for small volumes.

I really like ActiveJob::Status - it's very simple, but still manages to provide support for batches, and even simple callbacks. It might not support large volumes, but starting off with support for smaller volumes seems very reasonable. There's also people interested in this functionality in https://github.com/basecamp/solid_queue/issues/87. I'm convinced we'll find a way to scale this solution over time.


1 - I'd prefer to pass an array of jobs as an argument if we had to pass dozens of jobs:

Done!

2 - We could benefit from read_multi to read all statuses in only one roundtrip.

Done!

I've had to change the logic for calculating the statuses slightly. statuses.pluck(:status) doesn't work, because of how read_multi works - it returns a hash with keys being the cache keys and values being the cache hits. However, if there's no cache hit, that cache key isn't included in the result hash. Here's an example:

cache_keys = ['job_1', 'job_2', 'job_3']
cache.read_multi(*cache_keys)
# { 'job_1' => { status: 'finished' }, 'job_2' => { status: 'failed' }, 'job_3' => { status: 'finished' } }

However, if there's a cache miss, here's what happens

cache_keys = ['job_1', 'job_2', 'job_3']
cache.read_multi(*cache_keys)
# { 'job_1' => { status: 'finished' }, 'job_3' => { status: 'finished' } }

It might appear that all jobs finished successfully, but in reality, we missed the status for job_2. That's why I changed the implementation to make sure that there's a cache hit for every single job - so we know exactly what its status is.


@inkstak Can you take another look?

matteeyah avatar Jan 27 '24 13:01 matteeyah

The head tests are failing because base64 got moved to a bundled gem after 3.3.0 - https://www.ruby-lang.org/en/news/2023/12/25/ruby-3-3-0-released/ (see the "Standard Library" section)

matteeyah avatar Jan 27 '24 16:01 matteeyah

The head tests are failing because base64 got moved to a bundled gem after 3.3.0 - https://www.ruby-lang.org/en/news/2023/12/25/ruby-3-3-0-released/ (see the "Standard Library" section)

:+1:

I'll fix this and release a new version soon.

inkstak avatar Jan 27 '24 16:01 inkstak

FYI, I have a PR open to add batch support to SolidQueue: https://github.com/basecamp/solid_queue/pull/142

It hasn't gotten any response yet, so I'm not sure how it'll turn out in the end. I opened it with a partial but functional implementation to get the conversation started and hopefully land batch support in the next few months.

jpcamara avatar Feb 15 '24 10:02 jpcamara

Merged in https://github.com/inkstak/activejob-status/compare/v1.0.1...v1.1.0.beta.0. I'll release a beta version.

inkstak avatar Feb 18 '24 17:02 inkstak

@inkstak Just wanted to circle back and thank you for the help again. We've been using this in production for a while now and it's working without a hitch!

matteeyah avatar Apr 25 '24 10:04 matteeyah