meteor-job-collection Strange behavior with repeating jobs of the same type

Hello @vsivsi , I am currently finding a strange behaviour and maybe you can shed some light on the situation:

These are all the configured settings: Meteor 1.4.3.2 job-collection 1.5.1 server: Galaxy

JobCollection promote: every 30 min

Queue processJobs concurrency: 1, payload: 2, pollInterval: 1 minute, prefetch: 2, callbackStrict: true, workTimeout: 1 hour

Jobs of type TT priority : 'normal' repeat: schedule value "every 24 hours" retry : wait value 15 minutes, backoff: "constant"

I create 3 Jobs of the same type "TT" and what I am expecting to see is that when they are created, they are set to run for the first time on the following day at 00:00 (that works correctly), and then, that they will be promoted to ready as late as 00:30 and than they'll be fetched by the queue in groups of 2 starting 1 minute later. In this case since all 3 should become ready at the same time (no dependencies) the prefetch would make the third one also be fetched while processing the first 2. After being processed and completed next job should be created and scheduled for next day at 00:00. Intead, this is what I see:

I don't get why:

~~the after value for the first repeating jobs is set to the same day at 00:03:06 instead of 00:00:00 of next day (for 2 of them)~~
~~Why this happens again (settings the wrong after value) also for one of the jobs the next time both jobs that were set to after 00:03:06 are processed.~~

Update: Ok, so I have checked and it seems that whenever the job changes to ready, the after value gets updated, so the problem is why the "after" date is set to an incorrect value whenever the first job is done. I assume this happened since two of the jobs("mubJZ.....", and "rWNED......" were processed before the 24 at 00:00:00 and that's when they were supposed to run again like "eX6Q......")

Update2: I have seen this problem #217 and the workaround seems to be implemented in job save,

schedule = @later?.schedule(doc.repeatWait)
 +      unless schedule and next = schedule.next(2, schedule.prev(1, doc.after))[1]

but if you look at job.done function later is also used but without that fix

          next = @later?.schedule(doc.repeatWait).next(2)
          if next and next.length > 0
            d = new Date(next[0])
            if (d - time > 500) or (next.length > 1)
              if d - time <= 500
                d = new Date(next[1])
              else
              wait = d - time
              if doc.repeatUntil - wait >= time
                jobId = @_rerun_job doc, doc.repeats - 1, wait

~~Not sure if that has anything to do with the problem, did you check that part too @mitar ? (the 500 value seems a little weird to me)~~

Update3: I have verified that the problem comes from the previous code at job.done. If the server/db is busy and you set it like me to "every 24 hours", it is possible that the time of next[0] gets translated simply into a new Date() because of a later.js bug (if it gets done during the same 00 hour in my case) and this value can have a difference of more than 500 ms with the time when the variable time is created (db is busy or slow). In that case you stay with the value of next[0] which is the wrong date :(

I think the job.save workaround can be used also here to avoid this kind of problem.

Thanks for any update on this matter.

May 23 '17 13:05 tcastelli

Hi, just so you know, I'm about to drop off the internet for almost a month, so I won't be able to look into this until around June 20th. Sorry for any inconvenience.

May 25 '17 18:05 vsivsi

@tcastelli: Care to make a PR with a followup to #218 and with a test for what are you reporting here?

May 26 '17 10:05 mitar

I can try, but not really into coffeescript so a revision must be done by someone who is more fluent on that. I think by just copying and pasting your workaround would make it work for the other case too. For now, what we have done is just use "every 24hours at XX:XX" being xx:xx the time delimitation, making it harder for the error to happen (It should be done and finished within the same second)

May 29 '17 13:05 tcastelli

CoffeeScript will not bite you. :-) Try copying as you are suggesting and see if it works.

May 29 '17 20:05 mitar

meteor-job-collection meteor-job-collection copied to clipboard

Strange behavior with repeating jobs of the same type

meteor-job-collection
meteor-job-collection copied to clipboard