agenda icon indicating copy to clipboard operation
agenda copied to clipboard

findAndModify query not using indexing causing slowness on lock

Open vinioliveira opened this issue 4 years ago • 25 comments

Hello All,

After update from version 2.1 to 3.1, I noticed that mongodb server started using way too much CPU, the issue is with the query from findAndModify it happens that we're using mongo 3.4 and there's an issue with nested $and/$or this issue has been fixed in newer versions of mongo however if the library supports 3.4 mongo version we should provide a way to use the old query version or drop support to the3.4 mongodb

https://jira.mongodb.org/browse/SERVER-32441

https://github.com/agenda/agenda/blob/master/lib/agenda/find-and-lock-next-job.js#L39-L51

    const JOB_PROCESS_WHERE_QUERY = {
      $and: [{
        name: jobName,
        disabled: {$ne: true}
      }, {
        $or: [{
          lockedAt: {$eq: null},
          nextRunAt: {$lte: this._nextScanAt}
        }, {
          lockedAt: {$lte: lockDeadline}
        }]
      }]
    };

vinioliveira avatar May 29 '20 14:05 vinioliveira

Hey @vinioliveira, I am seeing very high CPU usage as well and thats why I found your issue.

But I am not using [email protected] but the latest [email protected]. Agenda version is the same [email protected].

Query is of course as well the same (taken from mongo log instead of the agenda code):

{
  $and: [ {
    name: "processEvent",
    disabled: { $ne: true }
  },
  {
    $or: [ {
      lockedAt: { $eq: null },
      nextRunAt: { $lte: new Date(1590843651350) }
  }

I am still investigating what might be the issue. Are you sure that you are seeing the high CPU usage because of the bug you have linked to?

jaschaio avatar May 30 '20 13:05 jaschaio

@jaschaio positive, we tested using new and old query once switched to the old query CPU usage went back to normal usage. One thing that we changed on our setup it was lockLimit to not allow agenda to lock the pending jobs at once, that makes things slower if you have a great number of pending jobs

vinioliveira avatar Jun 05 '20 12:06 vinioliveira

Thanks, I switched to Bull. Agenda is great depending on your use case, but if you have lots of small tasks that need to run as fast as possible it’s not a good solution.

jaschaio avatar Jun 05 '20 15:06 jaschaio

Unfortunately, we are experiencing the same symptoms, using the new version (actually v4.0.1 and no. v3.1) with mongo db 3.6 the DB CPU consumption is growing up to 100%. the query changed in https://github.com/agenda/agenda/pull/869. @simison @koresar as you folks better familiar with this change, could you please help us figure this out?

talik077 avatar Feb 07 '21 16:02 talik077

@talik077 we don't have such a large DB as yours. We can't reproduce the issue to act on it properly.

How about you find the best query, give it to us, so that we could release it as a bugfix?

koresar avatar Feb 07 '21 21:02 koresar

Unfortunately, we are experiencing the same symptoms, using the new version (actually v4.0.1 and no. v3.1) with mongo db 3.6 the DB CPU consumption is growing up to 100%. the query changed in #869. @simison @koresar as you folks better familiar with this change, could you please help us figure this out?

What indexes do you have right now on agenda collection? Can you copy the output of mongodb query planner (.explain()) of the agenda query?

simllll avatar Feb 07 '21 21:02 simllll

@simllll Thanks for jumping in. .explain() is a great idea!

Someday we would need to add plentiful of comments next to and inside of the JOB_PROCESS_WHERE_QUERY. So that future releases would break less perf. optimisations.

koresar avatar Feb 08 '21 02:02 koresar

To follow up on my previous comment, while bullmq does seem like a better option for a lot of messages that need to execute as fast as possible, I have sinced updated and use rabbitmq now. I still use agenda for scheduling, but it only proxies messages to rabbitmq which at least for me seems to be the optimal use case.

jaschaio avatar Mar 24 '21 10:03 jaschaio

@vinioliveira @jaschaio I'm having this same issue. Which versions are y'all finding to be the most stable right now? Did you ever find a solution to this?

wootwoot1234 avatar Apr 21 '21 20:04 wootwoot1234

I have also encountered the same problem and it was a huge pain at scale with more than 100K jobs / day. After several years of use we had to switch to bull.

axelmarciano avatar Aug 04 '21 00:08 axelmarciano

Hi there! what is the status of this issue? We have this issue and we explained the find query (all mongo versions from 3.6.6 to 6.0.2) and it does NOT uses the index properly it seems. Obviously when the amount of jobs if very high then it turns out being very slow and 100% cpu.

We use agenda 4.2.1.

Is there any viable workaround or alternatives? this is really a big issue for us.

-- this is the explain:

db.getCollection("agenda").find({
        $and: [
            { 
                name: "SomeJobHandler",
                disabled: { $ne: true } 
            },
            {
                $or: [
                    {
                        lockedAt: { $eq: null },
                        nextRunAt: { $lte: new ISODate("2022-11-02T22:03:36.089Z") }
                    },
                    { 
                        lockedAt: { $lte: new ISODate("2022-11-02T21:53:34.090Z") } 
                    }
                ]
            }
        ]
    }).explain()

and this is the output (check that in the winning plan part of the filter is not using the IXSCAN, that (we think) causes the issue

{
    "queryPlanner" : {
        "plannerVersion" : 1.0,
        "namespace" : "mydb.agenda",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "$or" : [
                        {
                            "$and" : [
                                {
                                    "lockedAt" : {
                                        "$eq" : null
                                    }
                                },
                                {
                                    "nextRunAt" : {
                                        "$lte" : ISODate("2022-11-02T22:03:36.089+0000")
                                    }
                                }
                            ]
                        },
                        {
                            "lockedAt" : {
                                "$lte" : ISODate("2022-11-02T21:53:34.090+0000")
                            }
                        }
                    ]
                },
                {
                    "name" : {
                        "$eq" : "SomeJobHandler"
                    }
                },
                {
                    "$nor" : [
                        {
                            "disabled" : {
                                "$eq" : true
                            }
                        }
                    ]
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "$or" : [
                    {
                        "$and" : [
                            {
                                "lockedAt" : {
                                    "$eq" : null
                                }
                            },
                            {
                                "nextRunAt" : {
                                    "$lte" : ISODate("2022-11-02T22:03:36.089+0000")
                                }
                            }
                        ]
                    },
                    {
                        "lockedAt" : {
                            "$lte" : ISODate("2022-11-02T21:53:34.090+0000")
                        }
                    }
                ]
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "name" : 1.0,
                    "nextRunAt" : 1.0,
                    "priority" : -1.0,
                    "lockedAt" : 1.0,
                    "disabled" : 1.0
                },
                "indexName" : "findAndLockNextJobIndex",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "name" : [

                    ],
                    "nextRunAt" : [

                    ],
                    "priority" : [

                    ],
                    "lockedAt" : [

                    ],
                    "disabled" : [

                    ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2.0,
                "direction" : "forward",
                "indexBounds" : {
                    "name" : [
                        "[\"SomeJobHandler\", \"SomeJobHandler\"]"
                    ],
                    "nextRunAt" : [
                        "[MinKey, MaxKey]"
                    ],
                    "priority" : [
                        "[MaxKey, MinKey]"
                    ],
                    "lockedAt" : [
                        "[MinKey, MaxKey]"
                    ],
                    "disabled" : [
                        "[MinKey, true)",
                        "(true, MaxKey]"
                    ]
                }
            }
        },
        "rejectedPlans" : [
            {
                "stage" : "FETCH",
                "inputStage" : {
                    "stage" : "OR",
                    "inputStages" : [
                        {
                            "stage" : "FETCH",
                            "filter" : {
                                "lockedAt" : {
                                    "$eq" : null
                                }
                            },
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                    "name" : 1.0,
                                    "nextRunAt" : 1.0,
                                    "priority" : -1.0,
                                    "lockedAt" : 1.0,
                                    "disabled" : 1.0
                                },
                                "indexName" : "findAndLockNextJobIndex",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                    "name" : [

                                    ],
                                    "nextRunAt" : [

                                    ],
                                    "priority" : [

                                    ],
                                    "lockedAt" : [

                                    ],
                                    "disabled" : [

                                    ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2.0,
                                "direction" : "forward",
                                "indexBounds" : {
                                    "name" : [
                                        "[\"SomeJobHandler\", \"SomeJobHandler\"]"
                                    ],
                                    "nextRunAt" : [
                                        "(true, new Date(1667426616089)]"
                                    ],
                                    "priority" : [
                                        "[MaxKey, MinKey]"
                                    ],
                                    "lockedAt" : [
                                        "[null, null]"
                                    ],
                                    "disabled" : [
                                        "[MinKey, true)",
                                        "(true, MaxKey]"
                                    ]
                                }
                            }
                        },
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "name" : 1.0,
                                "nextRunAt" : 1.0,
                                "priority" : -1.0,
                                "lockedAt" : 1.0,
                                "disabled" : 1.0
                            },
                            "indexName" : "findAndLockNextJobIndex",
                            "isMultiKey" : false,
                            "multiKeyPaths" : {
                                "name" : [

                                ],
                                "nextRunAt" : [

                                ],
                                "priority" : [

                                ],
                                "lockedAt" : [

                                ],
                                "disabled" : [

                                ]
                            },
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 2.0,
                            "direction" : "forward",
                            "indexBounds" : {
                                "name" : [
                                    "[\"SomeJobHandler\", \"SomeJobHandler\"]"
                                ],
                                "nextRunAt" : [
                                    "[MinKey, MaxKey]"
                                ],
                                "priority" : [
                                    "[MaxKey, MinKey]"
                                ],
                                "lockedAt" : [
                                    "(true, new Date(1667426014090)]"
                                ],
                                "disabled" : [
                                    "[MinKey, true)",
                                    "(true, MaxKey]"
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    },
    "serverInfo" : {
        "host" : "a8e4e20046f0",
        "port" : 27017.0,
        "version" : "3.6.6",
        "gitVersion" : "6405d65b1d6432e138b44c13085d0c2fe235d6bd"
    },
    "ok" : 1.0
}

jhmilan avatar Nov 03 '22 13:11 jhmilan

Honestly, you should consider another library. After using Agenda for years which has bugs or at least bad docs I switched to BullMQ. It’s been really good.

On Thu, Nov 3, 2022 at 8:55 AM Jose H. Milán @.***> wrote:

Hi there! what is the status of this issue? We have this issue and we explained the find query (all mongo versions from 3.6.6 to 6.0.2) and it does NOT uses the index properly it seems. Obviously when the amount of jobs if very high then it turns out being very slow and 100% cpu.

We use agenda 4.2.1.

Is there any viable workaround or alternatives? this is really a big issue for us.

-- this is the explain:

db.getCollection("agenda").find({ $and: [ { name: "PublishHandler", disabled: { $ne: true } }, { $or: [ { lockedAt: { $eq: null }, nextRunAt: { $lte: new ISODate("2022-11-02T22:03:36.089Z") } }, { lockedAt: { $lte: new ISODate("2022-11-02T21:53:34.090Z") } } ] } ] }).explain() ``` and this is the output (check that in the winning plan part of the filter is not using the IXSCAN, that (we think) causes the issue

{
    "queryPlanner" : {
        "plannerVersion" : 1.0,
        "namespace" : "publishables.agenda",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "$or" : [
                        {
                            "$and" : [
                                {
                                    "lockedAt" : {
                                        "$eq" : null
                                    }
                                },
                                {
                                    "nextRunAt" : {
                                        "$lte" : ISODate("2022-11-02T22:03:36.089+0000")
                                    }
                                }
                            ]
                        },
                        {
                            "lockedAt" : {
                                "$lte" : ISODate("2022-11-02T21:53:34.090+0000")
                            }
                        }
                    ]
                },
                {
                    "name" : {
                        "$eq" : "PublishHandler"
                    }
                },
                {
                    "$nor" : [
                        {
                            "disabled" : {
                                "$eq" : true
                            }
                        }
                    ]
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "$or" : [
                    {
                        "$and" : [
                            {
                                "lockedAt" : {
                                    "$eq" : null
                                }
                            },
                            {
                                "nextRunAt" : {
                                    "$lte" : ISODate("2022-11-02T22:03:36.089+0000")
                                }
                            }
                        ]
                    },
                    {
                        "lockedAt" : {
                            "$lte" : ISODate("2022-11-02T21:53:34.090+0000")
                        }
                    }
                ]
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "name" : 1.0,
                    "nextRunAt" : 1.0,
                    "priority" : -1.0,
                    "lockedAt" : 1.0,
                    "disabled" : 1.0
                },
                "indexName" : "findAndLockNextJobIndex",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "name" : [

                    ],
                    "nextRunAt" : [

                    ],
                    "priority" : [

                    ],
                    "lockedAt" : [

                    ],
                    "disabled" : [

                    ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2.0,
                "direction" : "forward",
                "indexBounds" : {
                    "name" : [
                        "[\"PublishHandler\", \"PublishHandler\"]"
                    ],
                    "nextRunAt" : [
                        "[MinKey, MaxKey]"
                    ],
                    "priority" : [
                        "[MaxKey, MinKey]"
                    ],
                    "lockedAt" : [
                        "[MinKey, MaxKey]"
                    ],
                    "disabled" : [
                        "[MinKey, true)",
                        "(true, MaxKey]"
                    ]
                }
            }
        },
        "rejectedPlans" : [
            {
                "stage" : "FETCH",
                "inputStage" : {
                    "stage" : "OR",
                    "inputStages" : [
                        {
                            "stage" : "FETCH",
                            "filter" : {
                                "lockedAt" : {
                                    "$eq" : null
                                }
                            },
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                    "name" : 1.0,
                                    "nextRunAt" : 1.0,
                                    "priority" : -1.0,
                                    "lockedAt" : 1.0,
                                    "disabled" : 1.0
                                },
                                "indexName" : "findAndLockNextJobIndex",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                    "name" : [

                                    ],
                                    "nextRunAt" : [

                                    ],
                                    "priority" : [

                                    ],
                                    "lockedAt" : [

                                    ],
                                    "disabled" : [

                                    ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2.0,
                                "direction" : "forward",
                                "indexBounds" : {
                                    "name" : [
                                        "[\"PublishHandler\", \"PublishHandler\"]"
                                    ],
                                    "nextRunAt" : [
                                        "(true, new Date(1667426616089)]"
                                    ],
                                    "priority" : [
                                        "[MaxKey, MinKey]"
                                    ],
                                    "lockedAt" : [
                                        "[null, null]"
                                    ],
                                    "disabled" : [
                                        "[MinKey, true)",
                                        "(true, MaxKey]"
                                    ]
                                }
                            }
                        },
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "name" : 1.0,
                                "nextRunAt" : 1.0,
                                "priority" : -1.0,
                                "lockedAt" : 1.0,
                                "disabled" : 1.0
                            },
                            "indexName" : "findAndLockNextJobIndex",
                            "isMultiKey" : false,
                            "multiKeyPaths" : {
                                "name" : [

                                ],
                                "nextRunAt" : [

                                ],
                                "priority" : [

                                ],
                                "lockedAt" : [

                                ],
                                "disabled" : [

                                ]
                            },
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 2.0,
                            "direction" : "forward",
                            "indexBounds" : {
                                "name" : [
                                    "[\"PublishHandler\", \"PublishHandler\"]"
                                ],
                                "nextRunAt" : [
                                    "[MinKey, MaxKey]"
                                ],
                                "priority" : [
                                    "[MaxKey, MinKey]"
                                ],
                                "lockedAt" : [
                                    "(true, new Date(1667426014090)]"
                                ],
                                "disabled" : [
                                    "[MinKey, true)",
                                    "(true, MaxKey]"
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    },
    "serverInfo" : {
        "host" : "a8e4e20046f0",
        "port" : 27017.0,
        "version" : "3.6.6",
        "gitVersion" : "6405d65b1d6432e138b44c13085d0c2fe235d6bd"
    },
    "ok" : 1.0
}

—
Reply to this email directly, view it on GitHub
<https://github.com/agenda/agenda/issues/1082#issuecomment-1302152059>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADGOGJPJ3XUUYV24GXI3H3WGO76FANCNFSM4NOCTHYQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>

wootwoot1234 avatar Nov 03 '22 21:11 wootwoot1234

@wootwoot1234 Totally agree, I used this library for 3 years and the sad reality is that it's not reliable and Bull works much better.

axelmarciano avatar Nov 03 '22 21:11 axelmarciano

Thanks for the feedback @wootwoot1234 and @axelmarciano

This is very sad news to be honest. We are thinking about some creative solutions to workaround the issue but it is really annoying that it can not be just fixed properly in the library just fixing the query/indices in a proper way.

At least, the library docs should be clearly advising about the limits and pitfalls. After having this 4 years in production, this is really a big issue.

Guys how can you move from agenda to bull? As far as I know it is queue with some features like delays, etc... The main issue is that we need a reliable schedule solution to make sure we run some stuff in specific time, say, in 4 months... That includes timezones etc... Is Bull an option for that?

Thanks in advance

jhmilan avatar Nov 03 '22 21:11 jhmilan

Hey @jhmilan, the migration to bull was very smooth for us despite the size of our service with over a hundred thousand jobs per day: scheduled job, delayed job, recurring cron jobs etc... everything works fine and it's very reliable ! There are a few pitfalls to avoid to prevent excessive use of redis that could impact server resources but you fill figure it out quickly.

axelmarciano avatar Nov 03 '22 21:11 axelmarciano

Another, maybe easier way, is to use https://www.npmjs.com/package/@hokify/agenda it's a fork called agendaTS, that has all the known issues fixed!

simllll avatar Nov 03 '22 21:11 simllll

Another, maybe easier way, is to use https://www.npmjs.com/package/@hokify/agenda it's a fork called agendaTS, that has all the known issues fixed!

Interesting, thanks. I see it requires mongo v4 at least but it would ok if the issues are really fixed. Will check it out

jhmilan avatar Nov 03 '22 22:11 jhmilan

@jhmilan Unfortunately there is currently a small number of "Active" maintainers including myself. We could use more people on the project since we are all working on agenda in our spare time.

Like @koresar said you could find an ideal index and give it to us, Please open a PR so we can fix this issue.

harisvsulaiman avatar Nov 04 '22 06:11 harisvsulaiman

The folks over at Rocket chat has been using agenda for some time . I hope @simllll and the folks at rocket like @murtaza98 will bring over their fixes so the wider community can benefit since this repo has the most stars and largest number of downloads (which is growing)

harisvsulaiman avatar Nov 04 '22 06:11 harisvsulaiman

@harisvsulaiman I agree with the above statements: BullMQ is more stable and faster. The Agenda fork is also a good way to do things because in OSS sometimes it's beneficial.

I'd love to give agenda NPM publishing permissions to the AgendaTS maintainers. They seem done great job! @simllll would you consider taking over Agenda and publish your fork as the next MAJOR version [email protected]?

koresar avatar Nov 07 '22 03:11 koresar

@koresar I think @simllll should consider bring whole the full repo and changes so that we can retain the star count.

This will help users identify the project.

harisvsulaiman avatar Nov 07 '22 04:11 harisvsulaiman

Thanks, I'm open for this idea to bring the commits in here. Unfortunately my time is quite limited right now and we need to figure out how we should approach, some ideas: 1.) try to create a PR that is based on the current agenda master, and try to keep as many commits from the agendaTS to have a change history. in case of merge conflicts, keep the agendaTS code. 2.) rename everything back to agenda 3.) improve docs of new features like job forking (https://github.com/hokify/agenda#sandboxed-worker---use-child-processes) 4.) write breaking changes and transition guide (e.g. there is no save job result in agendaTS, and before bringing that into it, I would like to have a PR to discuss about it, see also https://github.com/hokify/agenda/issues/29) there are also some other breaking changes like "define() config paramter moved from 2nd position to 3rd" and maybe even some more?

all in all, I'm more than grateful for your words and chance to bring the main agenda back to life ;-) thanks guys! but I definitely would need your support for accomplishing that!

simllll avatar Nov 07 '22 22:11 simllll

I can't invest my time any more into this project unfortunately. But will help granting all the necessary permissions.

koresar avatar Nov 08 '22 11:11 koresar

Hi all,

I have done some more investigation and can bring some more light to the topic.

It turns out that the explains I put here are not very accurate. I'm not a MongoDB expert and can not interpret 100% all the info returned there but there are quite conclusions that can be extracted, though:

  • Mongo query planning (as other DBs do) seems to use some heuristics which ends up in some degree of indeterminism. The query I explained was using a wrong name not matching any job in our DB so it was a FETCH with a IXSCAN inputStage etc...
  • Using the proper job name leads to a quite good query with agenda v4. Our impression is that the index is very good and well designed but 2 pitfalls can become the query very heavy under some circumstances.
  • Looking a the agenda fork that @simllll mentioned (thanks for that and thanks in advance if you guys merge the fork and become it agenda v5) we don't really understand how the query in that fork solves the issue (or just helps). Any explanation is helpful.
  • Bull, as some of you suggested, is unfortunately not a good replacement for Agenda (at least in our case). It is not a drop-in replacement because Bull, which is a very good library is a queue. A queue with 'delay' options, but a queue. It allows to 'schedule' based on cron-like config but it does not provide the consistency that Agenda provides based on exact dates and mongo, instead of a given amount of ms and Redis storage (with all the respect to Redis). If you want to schedule something for 4 months as of now and you want consistency, agenda is way better IMO.

What is wrong with query then?

You would expect that the query that picks/lock the next job to run is a 'covered' query, named, a query that is fully run using an index and, thus, very fast. This query has 3 imporant aspect to observe carefully:

The first bullet is sometimes problematic. Query planner something struggles with $or operarions and indexes doing funny things. It does not seems to be a problem here, luckily, but perhaps running 2 queries instead of 1 and then avoid the or would be better. I mean 1 query for the stalled jobs and 1 query for the 'next' jobs.

The second bullet is a real issue. Even though the explain tells you the index is/should be hit then there is an unavoidable filter after the IXSCAN which is probably caused by some MongoDB-by-design-thingie. In short, when using $eq: null MongoDB can consider this is an INEXACT INDEX and after using the index it will filter/re-check the documents. If the number of documents like this is very high, then it could be very slow. This seems to be our issue here since it only happens when the amount of (selectable) jobs is > 100K (more or less). The solution to this forces you to change the data model sinde we should not be representing unlocked with lockedAt: null but we a flag instead (isLocked: ` by doing that we would not need the null equality and the query should/woulb be a fully covered query.

The third thing about the sort should not be an issue given the current index (it seems a SORT_MERGE is used, which is nice) but we suspect after some threshold the query planner could be doing funny things or just the sort is 'slow' when the amount of elements to short is too big. All this is pure speculation, though.

Lines of thinking

Since we can not go for Bull as I explained and also we can not be sure that agenda v5 (the fork) will completely fix the issue, and moreover, observing that the problem arises after getting over some threshold on the amount of jobs of certain job name, then our idea for now is simple: Make sure the amount of jobs per job name is not growing over some limits.

To do this we will do some sharding, we will make sure that for some jobName instead of calling 'X' we will call it 'X_0', 'X_1'... to N (num shards). All this job names will run the same handler so in the end the effect is the same but instead of 1 (potential) slow query you will have (N) very fast queries.

We know this is far from an ideal solution but it (we hope) should be avoiding the problem to happen, if all our assumptions and research is correct and this is caused by this mongo pitfall which is mainly visible with high amount of results.

Newer versions of agenda (I believe) should consider removing the null equality completely and, maybe, also spliting the $or in 2 queries. Than should be a long term fix for this issue.

Hope all this 'brick' make sense to someone. We will share here if the solution really works as expected.

jhmilan avatar Nov 09 '22 23:11 jhmilan

@jhmilan I think this could be solved by adding a benchmark test to github actions and report if the performance regresses

harisvsulaiman avatar Nov 10 '22 13:11 harisvsulaiman