river icon indicating copy to clipboard operation
river copied to clipboard

[FEATURE REQUEST] Jobs sequential run

Open krhubert opened this issue 1 year ago • 8 comments

I haven't seen this in the api/docs, so please let me know if this is currently possible or not.

Description Imagine there's a shopping mall with only one unloading point. Trucks (jobs) are queued to unload the cargo one by one. This can't be done in parallel because the current infrastructure does not allow that.

Tech description Allow adding jobs that are guaranteed to run sequentially. The jobs "sequentiality" might be configured similar to jobs uniqueness - ByArgs, ByPeriod, ByQueue, ByState or ByKey (see below). The difference between sequential and unique is that sequential insert guarantees to add job to a queue but do not execute it until all jobs from the same "pool" are not completed.

API proposal There are two approaches to add ByKey API. One keeps it simple and only allows string as a key. The other can use generics but here the insert API becomes more complex .

type SequentialOpts struct {
	ByKey string
}

type SequentialOpts[K comparable] struct {
	ByKey K
}

type OrganizationArgs struct {
    OrganizationId int
    // other args relevant to this task 
}

func (OrganizationArgs ) Kind() string { return "organization_task" }

func (a OrganizationArgs) InsertOpts() river.InsertOpts {
    return river.InsertOpts{
        SequentialOpts: river.SequentialOpts{
            ByKey:  strconv.Itoa(a.OrganizationId) ,
        },
    }
}

Open questions

  1. should the order of jobs execution be controlled or not. It can be FIFO, LIFO, random or custom-defined.
  2. Maybe InsertOpts.Tags can handle this functionality. Docs says there are only for uses for grouping, maybe this field can be "promoted"

Workaround I can use a distributed-lock (redis/postgresql) based. If a job with a given arg starts to execute, I can try to obtain the lock, and if that's not possible, then snooze the job for a specific duration.

krhubert avatar Aug 22 '24 14:08 krhubert

Yeah, +1. I think we're going to implement this, but it's a little trickier because it requires some more invasive changes to the locking queries. It's on my list of next ~2-3 features to do.

Similar to:

https://github.com/riverqueue/river/discussions/431

brandur avatar Aug 22 '24 14:08 brandur

Do you have a rough ETA for this one?

krhubert avatar Aug 22 '24 15:08 krhubert

We've been pretty bad historically at delivering precise estimates on that sort of thing. I'd say a major feature usually takes ~2 weeks including reviews and everything, and this is 2-3 away, so maybe a month optimistically or two less optimistically?

That said, I'm going to look a little closer at it, and if I can spot a way to do it relatively easily, I might take it as the next feature after batch processing, so maybeee sooner.

brandur avatar Aug 22 '24 15:08 brandur

Thanks, this is great!

krhubert avatar Aug 22 '24 15:08 krhubert

+1 to this feature. We are loving river here at my job but this is holding up our adoption a little. We want to architect our jobs to be very small and allow for serial execution based on an id. Thanks for accepting this and we look forward to any progress.

mauza avatar Sep 10 '24 22:09 mauza

@mauza can you give any more info about how you’d want to use this? There are sort of two features conflated together here and I’d like to understand which one you’re more interested in or if you need to use them together for some reason:

  1. Ability to ensure that for a given partition key, the jobs execute one at a time and in the order they were inserted.
  2. Ensuring that only one job (or some larger number of jobs) with a given partition key can run at once globally, with no particular consideration given to a precise sequence order.

bgentry avatar Sep 10 '24 23:09 bgentry

Ensuring that only one job (or some larger number of jobs) with a given partition key can run at once globally, with no particular consideration given to a precise sequence order.

This is something we would like to have. We are looking at migrating to River and in our current solution we are using Redsync in one of our jobs to achieve this. The order in which the jobs run is not critical for us, as long as they never run concurrently for a given partition key.

haines avatar Sep 27 '24 15:09 haines

Ability to ensure that for a given partition key, the jobs execute one at a time and in the order they were inserted.

I wonder if it's theoretically possible to support any arbitrary order?

nexovec avatar Sep 27 '24 22:09 nexovec

Hi everyone, we added a "sequences" feature to River Pro in the latest release: https://riverqueue.com/blog/river-pro-sequences

This should address the use case here. We're still working on other ways of limiting concurrency for specific jobs, particularly when the desired concurrency is greater than one 😄

bgentry avatar Oct 11 '24 01:10 bgentry