DataFed icon indicating copy to clipboard operation
DataFed copied to clipboard

System - Task stuck in pending state

Open dvstans opened this issue 5 years ago • 4 comments

During testing large/concurrent create/delete operations, a delete task became stuck in pending state. Restarting the core service caused the task to run and complete correctly. In addition a single collection was not deleted - possibly related to the data records.

dvstans avatar Oct 19 '20 18:10 dvstans

I don't understand the exact problem. From the description it seems like there is more than one.

  1. Reliably and consistently handle a large number of concurrent requests
  2. Handling a request that is partially complete
  3. Resolving a partially complete request automatically

I don't know what this means and how it pertains to the problem:

"In addition, a single collection was not deleted"

Are collections not being deleted correctly?

JoshuaSBrown avatar Dec 27 '22 18:12 JoshuaSBrown

When a collection is deleted, all contained collections should be deleted, but in this case, one somehow survived - which is a bug. The delete task should not get stuck, I think this was due to very heavy loading on the DB. This was basically a stress test scenario and the system did not handle it well. I don't think any action can be taken on this issue until we have a way to recreate this issue with controlled stress testing.

dvstans avatar Dec 27 '22 18:12 dvstans

I see, so the prereq for this issue is to develop a stress test suite and environment.

JoshuaSBrown avatar Dec 27 '22 18:12 JoshuaSBrown

FYI, this issue may be related to the recent issue with the Lehigh repository. I suspect there is an edge case with task scheduling that is causing valid ready tasks to never run.

dvstans avatar Jan 26 '23 15:01 dvstans