jobflow icon indicating copy to clipboard operation
jobflow copied to clipboard

Reconstruct flow from outputs in JobStore [WIP]

Open mcgalcode opened this issue 2 years ago • 8 comments

Summary

This is a WIP PR addressing #374 . I implemented the storage of input references in the JobStoreDocument (from Hrushikesh's PR (here).

Checklist

Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request title.

Before a pull request can be merged, the following items must be checked:

  • [X] Code is in the standard Python style. The easiest way to handle this is to run the following in the correct sequence on your local machine. Start with running black on your new code. This will automatically reformat your code to PEP8 conventions and removes most issues. Then run pycodestyle, followed by flake8.
  • [X] Docstrings have been added in theNumpy docstring format. Run pydocstyle on your code.
  • [X] Type annotations are highly encouraged. Run mypy to type check your code.
  • [ ] Tests have been added for any new functionality or bug fixes.
  • [ ] All linting and tests pass.

Note that the CI system will run all the above checks. But it will be much more efficient if you already fix most errors prior to submitting the PR. It is highly recommended that you use the pre-commit hook provided in the repository. Simply cp pre-commit .git/hooks and a check will be run prior to allowing commits.

mcgalcode avatar Sep 09 '23 00:09 mcgalcode

I think there are a lot of possible interaction patterns people will use for retrieving and navigating flow outputs, so I just sketched some out here. I'd like to get some feed back from @utf and @arosen93 to see if any of this looks good.

I'm happy to rework/remove any of this per consensus here, but just wanted to throw something out there for a first pass.

mcgalcode avatar Sep 09 '23 00:09 mcgalcode

The interface of FlowOutput (here) sort of sketches out the interaction patterns I was imagining.

mcgalcode avatar Sep 09 '23 00:09 mcgalcode

Hi @mcgalcode ,

Very nice that you start this WIP PR and great start! I did not get into all the details, but from what I understand of the implementation, the Flow can be reconstructed if its jobs have finished (as job output documents are inserted into the database only when the complete). Do I understand this correctly? I guess it would be nice to reconstruct a Flow even when it hasn't started or while some of its jobs have completed and some others are running (or waiting), or be able to reconstruct a flow with a job that has failed (the failed job won't appear in the reconstructed Flow I think, as there won't be any job document to it).

Any ideas ?

davidwaroquiers avatar Sep 14 '23 11:09 davidwaroquiers

Hi @davidwaroquiers ! Sorry for the delayed response here, I think I missed turning on notifications for this one.

You understand correctly. This code reconstructs a flow from output objects that are present in the main output document store.

I like the idea of being able to reconstruct flows that have yet to be started, or flows that are incomplete or partially failed, but I'm a little hazy on how that would work. For instance, if a flow hasn't started yet (i.e., it hasn't been run by some type of manager), I thought it doesn't exist anywhere aside from the memory of the program that instantiated it. Is that not the case?

I can imagine doing this in the case of a particular manager, i.e. the fireworks manager. In that case, I think utilities for reconstructing a Flow from it's representation in fireworks would be particularly useful, and I actually have some sketchy helper functions that I use for this in my own work. Is that what you mean?

I think my jobflow understanding may be a bit insufficient here :)

mcgalcode avatar Sep 19 '23 20:09 mcgalcode

No you are perfectly right. I am actually always thinking in terms of an ongoing development you are not yet aware of (normal it's in private repos). I will add you to the repos (it's a remote execution mode of jobflow, which ultimately might be included directly into jobflow itself). If you have questions about it, feel free to contact me and I can give you a few more details.

davidwaroquiers avatar Sep 20 '23 10:09 davidwaroquiers

@davidwaroquiers Aha! Thanks for this clarification, I'll take a look at the repos you added me to. Sounds like it will inspire your suggestion :)

I do think that the functionality you're talking about would be very useful. It can be a little confusing to interact with job outputs since there is no formal record of whatever flow they belong to.

mcgalcode avatar Sep 21 '23 00:09 mcgalcode