hraven icon indicating copy to clipboard operation
hraven copied to clipboard

bugfix: getLastSuccessfulProcessRecord not taking processingDirectory

Open angadsingh opened this issue 11 years ago • 2 comments

Needed for example, for the ability to run hraven in an idempotent way for each day/shard's run and being able to rerun a shard's run and expect hraven to just resume instead of reprocessing everything.

angadsingh avatar Jun 29 '14 11:06 angadsingh

These contributions are great. We're starting to look through them. Can you elaborate how the output directory changes for you ?

The way we run, for a particular cluster, the output is always the same, so filtering on that directory would not really help. hRaven would already re-run and not process things again, since the processingRecord keep track of the high-water mark (the latest job-id that did finish properly). When one of the jobs in a batch has a problem, that ProcessingRecord does stay in that state, but the other processing records would be skipped. Even when a processing Record gets run again, not all jobs have to be re-loaded . When a long-running job completes, it will happen that there are "gaps" in the sense of an older job id now becoming the high-water mark, but the already processed records are neatly skipped.

The reason why getLastSuccessfulProcessRecord passes null to getLastSuccessfulProcessRecord, and why getProcessRecords can accept something other than null is to allow for historic runs, while the most recent runs are going on. For the historic runs we always passed forceAllFiles to be true.

Perhaps I'm not quite understanding your use-case. Could you please elaborate ?

jrottinghuis avatar Jul 01 '14 02:07 jrottinghuis

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Angad Singh seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 18 '19 15:07 CLAassistant