edison-microservice icon indicating copy to clipboard operation
edison-microservice copied to clipboard

[edison-jobs] Inconsistencies in job locks

Open mgeissen opened this issue 4 years ago • 0 comments

We found out, that there is some inconsistencies in the lock handling of edison-jobs.

Sometimes we are receiving the following log messages:

  • Clear Lock of Job JobName. Job stopped already
  • Clear Lock of Job JobName. JobID does not exist

Both messages comes from this method in the JobService:

/**
   * Checks all run locks and releases the lock, if the job is stopped.
   *
   * TODO: This method should never do something, otherwise the is a bug in the lock handling.
   * TODO: Check Log files + Remove
   */
  private void clearRunLocks() {
      jobMetaService.runningJobs().forEach((RunningJob runningJob) -> {
          final Optional<JobInfo> jobInfoOptional = jobRepository.findOne(runningJob.jobId);
          if (jobInfoOptional.isPresent() && jobInfoOptional.get().isStopped()) {
              jobMetaService.releaseRunLock(runningJob.jobType);
              LOG.error("Clear Lock of Job {}. Job stopped already.", runningJob.jobType);
          } else if (!jobInfoOptional.isPresent()){
              jobMetaService.releaseRunLock(runningJob.jobType);
              LOG.error("Clear Lock of Job {}. JobID does not exist", runningJob.jobType);
          }
      });
  }

This method is marked with a TODO and says that this should not happen. Currently we have no idea how could that happen. We found out that this happens with the DynamoDB and the MongoDB implementation.

mgeissen avatar Feb 17 '22 08:02 mgeissen