atomate2 icon indicating copy to clipboard operation
atomate2 copied to clipboard

Should `task_label` always match job name?

Open jmmshn opened this issue 2 years ago • 7 comments

This line is causing some problems for me: https://github.com/materialsproject/atomate2/blob/a31b86b1e2f4d1a6665be7764c5cc67673904c91/src/atomate2/vasp/jobs/base.py#L149

This seems to indicate that if the user wants to change the name of a job (ex. adding formulas to the names) it will break subsequent querying of the resulting tasks database.

I like seeing the formula names in the FW web GUI as I'm working on new workflows. Not sure how everyone else feels.

jmmshn avatar Jul 02 '22 00:07 jmmshn

It seems like task_label has been a little overloaded: it's often used to encode the type of calculation, and also a user-readable name.

Perhaps we just need multiple fields? A label (which is just that, a human-readable label, arbitrary), but also store the maker name as a separate field?

mkhorton avatar Aug 29 '22 23:08 mkhorton

agreed, so I guess this change should happen when the document models move to emmet eh? I linked this in the other thread.

We should tabulate a list of these things and just do the change once.

jmmshn avatar Aug 29 '22 23:08 jmmshn

I'm not sure I see the problem. This is also how its done in atomate1. The calculation type is available through task_type and calc_type.

This seems to indicate that if the user wants to change the name of a job (ex. adding formulas to the names) it will break subsequent querying of the resulting tasks database.

What sort of querying are you thinking about? If you add formulas you can always query using regex matching.

utf avatar Aug 30 '22 08:08 utf

On Matt's point. I am also thinking to store the maker/function in the output database. However, it wouldn't be part of the task document itself but one level higher up. E.g., as part of this dict: https://github.com/materialsproject/jobflow/blob/073266cf8a3e9e06abf351a2728a46f159d99f32/src/jobflow/core/job.py#L579-L586

I'd be happy to accept that as a PR :)

utf avatar Aug 30 '22 08:08 utf

What sort of querying are you thinking about? If you add formulas you can always query using regex matching.

I'm thinking about how builders for smaller research projects typically rely on (not very robust) queries of the tasks database. This gets compounded a little bit because as people do more dynamic workflows in automate2 they basically have to do custom job names to navigate their own workflows. And the people building the workflows might now be conscious of the query problems they can cause later and then end with a name that is difficult or even impossible to regex.

jmmshn avatar Aug 30 '22 16:08 jmmshn

@utf so I think the problem here is that self.name gets modified by things like append_name which basically indicates that you should change it as part of your workflow. But then gets used by the task document as a stand-in for calc_type so it's serving two somewhat different functions at the same time which I think is problematic. I think if we assign calc_types to the different IntputSetGenerators and grab that value that should sort everything out automatically right?

jmmshn avatar Feb 15 '23 22:02 jmmshn

On Matt's point. I am also thinking to store the maker/function in the output database. However, it wouldn't be part of the task document itself but one level higher up. E.g., as part of this dict: https://github.com/materialsproject/jobflow/blob/073266cf8a3e9e06abf351a2728a46f159d99f32/src/jobflow/core/job.py#L579-L586

I'd be happy to accept that as a PR :)

I am in need of this feature to navigate some of my workflows. I'd be happy to implement it if we still think this is what we want? @utf

I see this as a different feature to @jmmshn's comments on Feb 15.

mjwen avatar May 11 '23 17:05 mjwen