edison-microservice
edison-microservice copied to clipboard
[edison-jobs] Inconsistencies in job locks
We found out, that there is some inconsistencies in the lock handling of edison-jobs.
Sometimes we are receiving the following log messages:
- Clear Lock of Job JobName. Job stopped already
- Clear Lock of Job JobName. JobID does not exist
Both messages comes from this method in the JobService:
/**
* Checks all run locks and releases the lock, if the job is stopped.
*
* TODO: This method should never do something, otherwise the is a bug in the lock handling.
* TODO: Check Log files + Remove
*/
private void clearRunLocks() {
jobMetaService.runningJobs().forEach((RunningJob runningJob) -> {
final Optional<JobInfo> jobInfoOptional = jobRepository.findOne(runningJob.jobId);
if (jobInfoOptional.isPresent() && jobInfoOptional.get().isStopped()) {
jobMetaService.releaseRunLock(runningJob.jobType);
LOG.error("Clear Lock of Job {}. Job stopped already.", runningJob.jobType);
} else if (!jobInfoOptional.isPresent()){
jobMetaService.releaseRunLock(runningJob.jobType);
LOG.error("Clear Lock of Job {}. JobID does not exist", runningJob.jobType);
}
});
}
This method is marked with a TODO and says that this should not happen. Currently we have no idea how could that happen. We found out that this happens with the DynamoDB and the MongoDB implementation.