Coworker icon indicating copy to clipboard operation
Coworker copied to clipboard

[Question] Is it possible that sometimes the same delayed_work record is processed more than once?

Open RMISCG opened this issue 4 years ago • 7 comments

Description

Is it possible that sometimes the same delayed_work record is processed more than once?

From the testing we've been able to do so far, it seems like this is an intermittent issue. Also, I haven't encountered a scenario where the same record was processed more than twice. I have created a sample Spring Boot project here https://github.com/RMISCG/queue which you can use for testing purposes. The project has a README.md file in the parent directory for more details.

Additional Information

  • Framework Version: Java: 11.0.4 Spring Boot: 2.2.5.RELEASE Gradle: 5.6

  • Platform: macOS Mojave Version 10.14.6

  • Command/Work You Tried to Run:

  1. cd into the parent directory.
  2. ./gradlew clean
  3. ./gradlew build
  4. ./gradlew bootRun
  • Any Output you Recieved (record with state 09 processed twice):
2020-03-03 12:58:35.846  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Starting QueueApplication on ml0179620 with PID 34280 (/Users/rmis/dev/apps-external/queue/build/classes/java/main started by rmis in /Users/rmis/dev/apps-external/queue)
2020-03-03 12:58:35.849  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : No active profile set, falling back to default profiles: default
2020-03-03 12:58:36.413  INFO 34280 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2020-03-03 12:58:36.993  INFO 34280 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2020-03-03 12:58:37.499  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 01 and result 01
2020-03-03 12:58:37.610  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 02 and result 02
2020-03-03 12:58:37.722  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 03 and result 03
2020-03-03 12:58:37.834  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 04 and result 04
2020-03-03 12:58:37.949  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 05 and result 05
2020-03-03 12:58:38.060  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 06 and result 06
2020-03-03 12:58:38.172  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 07 and result 07
2020-03-03 12:58:38.282  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 08 and result 08
2020-03-03 12:58:38.393  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 09 and result 09
2020-03-03 12:58:38.504  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 10 and result 10
2020-03-03 12:58:38.616  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 11 and result 11
2020-03-03 12:58:38.729  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 12 and result 12
2020-03-03 12:58:38.840  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 13 and result 13
2020-03-03 12:58:38.951  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 14 and result 14
2020-03-03 12:58:39.065  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 15 and result 15
2020-03-03 12:58:39.177  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 16 and result 16
2020-03-03 12:58:39.361  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 17 and result 17
2020-03-03 12:58:39.478  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 18 and result 18
2020-03-03 12:58:39.588  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 19 and result 19
2020-03-03 12:58:39.700  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 20 and result 20
2020-03-03 12:58:39.813  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 21 and result 21
2020-03-03 12:58:39.926  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 22 and result 22
2020-03-03 12:58:40.038  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 23 and result 23
2020-03-03 12:58:40.151  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 24 and result 24
2020-03-03 12:58:40.262  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 25 and result 25
2020-03-03 12:58:40.373  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 26 and result 26
2020-03-03 12:58:40.485  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 27 and result 27
2020-03-03 12:58:40.599  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 28 and result 28
2020-03-03 12:58:40.712  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 29 and result 29
2020-03-03 12:58:40.832  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 30 and result 30
2020-03-03 12:58:40.947  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 31 and result 31
2020-03-03 12:58:41.061  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 32 and result 32
2020-03-03 12:58:41.172  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 33 and result 33
2020-03-03 12:58:41.283  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 34 and result 34
2020-03-03 12:58:41.395  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 35 and result 35
2020-03-03 12:58:41.506  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 36 and result 36
2020-03-03 12:58:41.620  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 37 and result 37
2020-03-03 12:58:41.737  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 38 and result 38
2020-03-03 12:58:41.849  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 39 and result 39
2020-03-03 12:58:41.963  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 40 and result 40
2020-03-03 12:58:42.075  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 41 and result 41
2020-03-03 12:58:42.186  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 42 and result 42
2020-03-03 12:58:42.299  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 43 and result 43
2020-03-03 12:58:42.411  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 44 and result 44
2020-03-03 12:58:42.523  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 45 and result 45
2020-03-03 12:58:42.634  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 46 and result 46
2020-03-03 12:58:42.746  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 47 and result 47
2020-03-03 12:58:42.865  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 48 and result 48
2020-03-03 12:58:42.975  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 49 and result 49
2020-03-03 12:58:43.088  INFO 34280 --- [           main] com.rss.queue.QueueApplication           : Created com.rss.queue.job.EchoJob with state 50 and result 50
2020-03-03 12:58:43.366  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 01
2020-03-03 12:58:43.442  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 50
2020-03-03 12:58:43.519  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 49
2020-03-03 12:58:43.596  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 48
2020-03-03 12:58:43.673  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 47
2020-03-03 12:58:44.118  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 46
2020-03-03 12:58:44.192  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 45
2020-03-03 12:58:44.266  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 44
2020-03-03 12:58:44.341  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 43
2020-03-03 12:58:44.421  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 42
2020-03-03 12:58:44.867  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 41
2020-03-03 12:58:44.944  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 40
2020-03-03 12:58:45.020  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 39
2020-03-03 12:58:45.105  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 38
2020-03-03 12:58:45.188  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 37
2020-03-03 12:58:45.696  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 36
2020-03-03 12:58:45.771  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 35
2020-03-03 12:58:45.846  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 34
2020-03-03 12:58:45.922  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 33
2020-03-03 12:58:45.996  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 32
2020-03-03 12:58:46.510  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 31
2020-03-03 12:58:46.587  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 30
2020-03-03 12:58:46.665  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 29
2020-03-03 12:58:46.741  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 28
2020-03-03 12:58:46.819  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 27
2020-03-03 12:58:47.269  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 26
2020-03-03 12:58:47.344  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 25
2020-03-03 12:58:47.418  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 24
2020-03-03 12:58:47.493  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 23
2020-03-03 12:58:47.569  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 22
2020-03-03 12:58:48.103  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 21
2020-03-03 12:58:48.179  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 20
2020-03-03 12:58:48.253  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 19
2020-03-03 12:58:48.327  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 18
2020-03-03 12:58:48.400  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 17
2020-03-03 12:58:48.931  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 16
2020-03-03 12:58:49.005  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 15
2020-03-03 12:58:49.079  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 14
2020-03-03 12:58:49.153  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 13
2020-03-03 12:58:49.232  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 12
2020-03-03 12:58:49.668  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 11
2020-03-03 12:58:49.751  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 10
2020-03-03 12:58:49.827  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 09
2020-03-03 12:58:49.904  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 08
2020-03-03 12:58:49.980  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 07
2020-03-03 12:58:50.495  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 06
2020-03-03 12:58:50.570  INFO 34280 --- [pool-1-thread-2] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 05
2020-03-03 12:58:50.648  INFO 34280 --- [pool-1-thread-3] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 04
2020-03-03 12:58:50.726  INFO 34280 --- [pool-1-thread-4] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 03
2020-03-03 12:58:50.800  INFO 34280 --- [pool-1-thread-5] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 02
2020-03-03 12:58:51.836  INFO 34280 --- [pool-1-thread-1] com.rss.queue.job.EchoJob                : Running com.rss.queue.job.EchoJob with state 09
Disconnected from the target VM, address: '127.0.0.1:59884', transport: 'socket'

Process finished with exit code 130 (interrupted by signal 2: SIGINT)

RMISCG avatar Mar 03 '20 22:03 RMISCG

Hey,

It is certainly possible that a job can be worked twice. Although, I don't think this is a case where it should be. The rework should only happen in a failure case where a job failed to report it's status to the DB. I'll take a bit of a deeper look here as to why a job is getting queued multiple times.

Mythra avatar Mar 03 '20 22:03 Mythra

I've run it 10 times, and haven't been able to replicate. Could you perhaps turn on the log level filtering off (I realize that'll be a lot of logs), and post there somewhere like gist.github.com so I can view the logs to see why it's being picked up again?

Mythra avatar Mar 03 '20 23:03 Mythra

here's a gist snippet: https://gist.github.com/RMISCG/90780e69d8e517d80b794ca3231860b1 you'll notice that record with state 07 was processed twice. search for string "Running com.rss.queue.job.EchoJob with state 07"

also, if you want to do this yourself, show all coworker logs, simply remove the following block from application.yml:

logging:
  level:
    io:
      kungfury:
        coworker: WARN

RMISCG avatar Mar 04 '20 00:03 RMISCG

Thanks for this! I do know how to change the logs, I just can't reproduce it unfortunately :smile:

So the logs actually show that the job gets released back to the pool:

  • https://gist.github.com/RMISCG/90780e69d8e517d80b794ca3231860b1#file-queuecoworkerlogs-txt-L226

The only places ReleaseToPool are called:

  • When a node first comes online it releases it's work that it previously had incase a job gets restarted while it was working: https://github.com/SecurityInsanity/Coworker/blob/d9e6aabb6d/src/main/kotlin/io/kungfury/coworker/CoworkerManager.kt#L104

  • If a service-checker is passed, and it reports a node as offline (this can't be called since the default service checker never reports any nodes as offline since it has no way of knowing): https://github.com/SecurityInsanity/Coworker/blob/d9e6aabb6d/src/main/kotlin/io/kungfury/coworker/CoworkerManager.kt#L121

  • If the future that is handling the work is cancelled: https://github.com/SecurityInsanity/Coworker/blob/d9e6aabb6d/src/main/kotlin/io/kungfury/coworker/CoworkerManager.kt#L219

  • If the work is marked as finished without finishWork being called: https://github.com/SecurityInsanity/Coworker/blob/d9e6aabb6d/src/main/kotlin/io/kungfury/coworker/CoworkerManager.kt#L235

  • If the NStrand is at maximum: https://github.com/SecurityInsanity/Coworker/blob/d9e6aabb6d/src/main/kotlin/io/kungfury/coworker/CoworkerManager.kt#L378

My biggest thought is that the future is cancelled. As none of the others make sense (and I can't seem to reproduce). Is there anything you can think of that would kill the thread/runnable? Perhaps using too much memory? Something along those lines?

Mythra avatar Mar 04 '20 00:03 Mythra

I'm not really sure why you're unable to reproduce this. Maybe you can:

  1. increase the number of delayed_work records - increase NUMBER_OF_JOBS in QueueApplication. AND/OR
  2. increase the number of coworker threads - increase THREADS in CoworkerConfiguration.

RMISCG avatar Mar 04 '20 00:03 RMISCG

Hi,

Any updates on this?

Thanks

RMISCG avatar Mar 10 '20 18:03 RMISCG

Hey,

I'm still not able to replicate this on my end. Even after increasing NUMBER_OF_JOBS && THREADS. Though I still think the future is getting cancelled (somehow). If you'd like debug further yourself I recommend adding in some debug lines on the line I pointed out where a job gets released to the pool for a future getting cancelled (as well as the others) to see where exactly the job is getting released back to the pool.

Mythra avatar Mar 10 '20 18:03 Mythra