Insufficient information stored to recover parent-child relationship between jobs from their `JobStore` output docs?
Please advise if I've mis-interpreted the code/docs.
Assume:
- Storing
Joboutputs viaJobStore. - Not using a workflow manager, i.e. using
jobflowdirectly.
For a given document in the JobStore, I see uuid, I also see hosts (which can be used to see that a given Job belongs to the same Flow), however, that I can see, there is no way to see the dependency relationship between two or more Job output documents, is this correct?
If correct, is this intended usage? What would be a minimal way to retain this information, without adding a dependency on a specific workflow manager?
Hi @mkhorton, this is something I've spoken to @gpetretto and @davidwaroquiers about. I believe the only other information you need to resolve the job dependencies are the OutputReferences in the job inputs. These are available through the job.input_references property.
The simplest way to enable this would be:
- At the beginning of the
job.runfunction, copy the output ofjob.input_references. The reason why we have to copy them at the beginning is that thejob.resolve_argsfunction resolves the references in place. So at the end of the function the original input references are not available. - Add a new field field
"input_references" to the data stored at the end ofjob.run`. E.g., here: https://github.com/materialsproject/jobflow/blob/53f0c76e4cc0baf40c1f9365b31ad2604cc8a601/src/jobflow/core/job.py#L599
You should then be able to construct the entire flow (including nested flows) and the dependencies between jobs. The only information that will be missing is the names of the Flows (the names of the jobs are fine). The reason is that we don't store flows in the database directly.
Thanks for the reply @utf, good to know I wasn't missing anything obvious.
I'll see if I can make a PR to add this, unless @gpetretto or @davidwaroquiers are already working on it? If it'd be welcome, I'd quite like to add a pydantic.BaseModel to describe the JobStore document format too.
A PR would be very welcome. And yes, agreed that we should have a document model for the job store document.
It would indeed be very useful to be able to "reconstruct" the Flow(s) after they have run (or while they are running) in order to visualize them. We've indeed already discussed about this but haven't started working on this. This issue also falls within a set of other features that would be nice to have and are somewhat interconnected. I would maybe like to raise the idea to have a meeting with the most active developers/contributors in order to list out and somehow plan for the short/mid-term developments. @utf What do you think ?
@mkhorton did you end up starting work on this? I offered to make some contributions to jobflow and would love to tackle this, and am planning to start working on it now. Happy to hold off/coordinate though if you have any concerns or WIP.
By all means Max, go ahead! I do not have a WIP. Let me know if you have any problems however (perhaps open a PR early so anyone interested can comment?)
Sounds good Matt! Early PR is a good idea for sure.