bullmq icon indicating copy to clipboard operation
bullmq copied to clipboard

[Bug]: Scheduled Job Disappearing

Open digipigeon opened this issue 8 months ago • 2 comments

Version

5.46.1

Platform

NodeJS

What happened?

I am running Bull MQ 5 in multi-worker-server setup.

When a (single) Scheduled Task (dedicated queue) is executed the task disappears. This only seems to happen on a new server which is deployed. I am unable to isolate the difference.

See Redis Log.

TIA

How to reproduce.

No response

Relevant log output



Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

digipigeon avatar Apr 05 '25 17:04 digipigeon

The information you provide for us is too little, there is not action we can take based on this.

manast avatar Apr 05 '25 20:04 manast

Hello manast,

It looks like the attached log went missing (my fault). I will find and reattach.

digipigeon avatar Apr 07 '25 11:04 digipigeon

hi @digipigeon wondering if you are using "every" option

roggervalf avatar May 18 '25 06:05 roggervalf

This might be the same issue as #3272.

nwalters512 avatar May 20 '25 00:05 nwalters512

Sorry for the delay, I got side tracked, I will try to get a log today.

I am not sure if this is the same issue, it does not seem the same at present.

digipigeon avatar May 20 '25 06:05 digipigeon

Hello,

The job I am having problems with is called cron-alert, so I have ran the following to obtain all the commands related to this namespace

redis-cli MONITOR | grep alert

I then run my simple application, which creates a repeatable job to run every minute and also subscribes to that queue.

I successfully see that the repeatable job has been created in arena

I then see it disappear when the job starts.

After the job finishes there is still no (next) repeatable job created. So the repeatable job is effectively fully removed and it never triggers again.

Please let me know if you need any more information or you have anotherwise prefered way for me to log what is going on.

digipigeon avatar May 21 '25 11:05 digipigeon

@digipigeon any chance you could provide a snippet that reproduces this issue? that will be very helpful so that we can also provide a solution that addresses specifically your problem.

manast avatar May 21 '25 12:05 manast

Certainly, please remember that this works with no problems on our older (exact same code and libraries) server.

/*
TOML Config

[module.alert]
path = "./minute/alert"
cron = "* * * * *"
controller = "minute"

main.js


*/
const config            = require('toml').parse('' + require('fs').readFileSync('config.toml'));
const { Worker, Queue } = require('bullmq');
const path              = require('path');
const connection = JSON.parse(JSON.stringify(config.bull.redis));
const bullOpts          = {connection, prefix: 'bullmq5'}

Object.entries(config.module)
    .forEach(async ([mod, modConfig]) => {
        console.log('Starting', mod);
        const processorFile = path.join(__dirname, modConfig.path) + '.js';
        const q = new Queue('cron-' + mod, bullOpts);
        const worker = new Worker('cron-' + mod, processorFile, bullOpts);
        worker.on('completed', function(job) { // result
            // Do some external stuff
        });
        await q.add('cron-' + mod, {}, {repeat: {cron: modConfig.cron}, attempts: 1});
        q.clean(86400 * 7);
        q.clean(86400 * 7, 'failed');
        
        setInterval(() => {
            q.clean(86400 * 7);
            q.clean(86400  * 7, 'failed');
        }, 3600 * 1000);
    });

This calls an external file which has various processing in it, however nothing that touches bull

alert.js

module.exports = async function(job) {
	job.log('Starting Alert');

    // Iterator removed for brevity
        try {
            // Do Stuff
            job.log(`${type} to ${body.to} sent`);
            if (job && job.updateProgress && total) job.updateProgress(parseInt(idx / total.length * 100))
        } catch (e) {
    		job.log(e); // Re-looking at this, I probably should be changing this to e.message
        }
    // End of iterator

	job.log('Complete');
}

I have stripped out the other bits of logic and only included what happens with anything related to bullmq.

digipigeon avatar May 22 '25 08:05 digipigeon

I just re-tested by replacing job.log(e); with job.log(e.message); and it made no difference.

Additionally just to mention, we run 27 of these scheduled jobs, all have different logic inside them different repetitions, they are created and ran from the same code block as above, all present the same failure modality on the new server.

digipigeon avatar May 22 '25 08:05 digipigeon

job.log is an asynchronous call, so you must await it: await job.log('Starting Alert');

manast avatar May 22 '25 21:05 manast

Hi @manast sorry I just wanted to confirm something...

Most of the time I am building applications for performance and waiting for logging to write is not something that I am really interested in doing.

Aside from not knowing if the log write fails or not, when you say that I must wait for it. Is there some other good reason for this?

Thanks in advance.

digipigeon avatar May 23 '25 14:05 digipigeon

Hello,

I have just noticed the change to 5.53.1 for the closure of https://github.com/taskforcesh/bullmq/pull/3269. I have just updated and ran a test and I am sorry to confirm that this does NOT fix the bug.

digipigeon avatar May 30 '25 09:05 digipigeon

Aside from not knowing if the log write fails or not, when you say that I must wait for it. Is there some other good reason for this?

hi @digipigeon. Log method is an asynchronous method. Also it validated that a job exits before trying to add a log record, if job does not exist at the time that it's being executed, it'll throw an error

roggervalf avatar May 30 '25 23:05 roggervalf

could you also try to use upsertJobScheduler methods and let us know how it goes https://docs.bullmq.io/guide/job-schedulers

roggervalf avatar May 30 '25 23:05 roggervalf

Hello, this does not appear to help the situation, the same problem happens.

digipigeon avatar Jun 09 '25 22:06 digipigeon

hi @digipigeon, one fix was deployed few days ago in v5.53.1, pls try it and let us know

roggervalf avatar Jun 10 '25 02:06 roggervalf

Hello,

I have just noticed the change to 5.53.1 for the closure of #3269. I have just updated and ran a test and I am sorry to confirm that this does NOT fix the bug.

@roggervalf

cgtarmenta avatar Jun 21 '25 17:06 cgtarmenta

Hello, I have just noticed the change to 5.53.1 for the closure of #3269. I have just updated and ran a test and I am sorry to confirm that this does NOT fix the bug.

@roggervalf

Which test is not working? The ones we managed to reproduce are all in the test suite and passing. If you have a different case that is not passing, please provide us with the code.

manast avatar Jun 21 '25 17:06 manast

Hi @manast sorry I just wanted to confirm something...

Most of the time I am building applications for performance and waiting for logging to write is not something that I am really interested in doing.

Aside from not knowing if the log write fails or not, when you say that I must wait for it. Is there some other good reason for this?

Thanks in advance.

As a rule of thumb for writing robust code, please await asynchronous calls, so that you avoid unexpected results down the road. If you are not awaiting you have no idea when that call is going to be processed, potentially leading to undesired side effects.

manast avatar Jun 21 '25 17:06 manast

closing as we do not have any reproducible test code nor any feedback to our latest replies.

manast avatar Jul 09 '25 09:07 manast