aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

`ProcessFunction`: Add support for variadic arguments

Open sphuber opened this issue 1 year ago • 13 comments

Fixes #1172

Up till now, variadic arguments, i.e., arguments defined as *args in a function signature which collects any remaining positional arguments, were not supported for process functions. The main reason was that it wasn't immediately clear what the link label should be for these inputs.

For normal positional arguments we can take the name of the argument declaration in the function signature, and for keyword arguments we take the keyword with which the argument is passed in the function invocation. But for variadic arguments there is no specific argument name, not in the function signature, nor in the function invocation. However, we can simply create a link label. We just have to ensure that it doesn't clash with the link labels that will be generated for the positional and keyword arguments.

Here the link label will be determined with the following format:

`{label_prefix}_{index}`

The label_prefix is determined the name of the variadic argument. If a function is declared as function(a, *args, **kwargs) the prefix will be equal to args and if it is function(*some_var_args) it will be some_var_args. The index will simply be the zero-base index of the argument within the variadic arguments tuple. This would therefore give link labels args_0, args_1 etc. in the first example.

If there would be a clash of labels, for example with the function def:

def function(args_0, *args):

which when invoked as:

function(1, *(2, 3))

would generate the labels args_0 for the first positional argument, but also args_0 for the first variadic argument. This clash is detected and in this case the label for the variadic arguments is updated by adding the first four characters of a UUID4 string before the argument index, e.g., args_be4a_0 etc. This is done in a loop until none of the variadic argument labels clash any longer.

sphuber avatar Oct 06 '22 09:10 sphuber

Hey @sphuber do you have a specific use case in mind for this?

I would point to the zen python "explicit is better than implicit" and ask why can't we just allow for kwargs and avoid such implicit behaviour?

chrisjsewell avatar Oct 06 '22 09:10 chrisjsewell

Hey @sphuber do you have a specific use case in mind for this?

See #1172 . Apparently there were multiple use-cases from users. I don't have one personally but I can definitely see the point. I don't see a strong reason not to allow it. I mean, Python itself supports variadic arguments, so why would we prohibit it?

And finally, users are not forced to use it. If they prefer to be more explicit with positional and keyword arguments, they can do so without any issues.

Together with #5688 to support automatic base type input serialization, I think process functions can become even easier to use. You would be able to

@calcfunction
def sum_integers(*args):
    return sum(args)

assert sum_integers(1, 2, 3) == 6

sphuber avatar Oct 06 '22 09:10 sphuber

By the way, once agreed upon the interface, I will update the documentation.

sphuber avatar Oct 06 '22 09:10 sphuber

See https://github.com/aiidateam/aiida-core/issues/1172 . Apparently there were multiple use-cases from users.

This is not a use case to use varadic arguments though, its an argument to use a list of nodes an input, and similarly we should make it easy to use a dict of nodes as an input.

There is an obvious way how to specify in the function construction, what the inputs should be (also allowing for type validation): type annotations:

@calcfunction
def sum_integers(integers: list[orm.Int]) -> orm.Int:
    return orm.Int(sum(*(i.value for i in integers)))

You could even extend this kind of type validation further in the future with: https://docs.python.org/3/library/typing.html#typing.Annotated

Secondly, with the use of input/output lists and dicts, it should be easier to "reconstruct" them. At present the link table of AiiDA is quite restrictive: https://github.com/aiidateam/aiida-core/blob/1c5dfef09d18895ce327b4b145466373f9f05118/aiida/storage/psql_dos/models/node.py#L149

I think it might be beneficial to add a metadata colmn, where you could signify that this link is part of a list or dict, and the index/key for that list/dict

chrisjsewell avatar Oct 06 '22 13:10 chrisjsewell

Thanks for working on this! I spent quite some time banging my hand against this restriction. :face_with_head_bandage:

Hey @sphuber do you have a specific use case in mind for this?

Just as an example of my use case, functionality like this is needed if the number of inputs is not known beforehand. Right now instead of simple calcfunction I need to use a WorkChain with an input with dynamic namespace. Here's an example where I simply want to combine multiple StructureData node into a single TrajectoryData node.

https://github.com/danielhollas/aiidalab-ispg/blob/13f294438422a4eccfe2647baca621860d2416fd/workflows/aiidalab_atmospec_workchain/init.py#L55

(happy to hear if there's an easier way)

danielhollas avatar Oct 07 '22 00:10 danielhollas

Thanks for the input @danielhollas . You could in principle use kwargs for your use case. You just have to create a dictionary with arbitrary keys as an input for your StructureData. Something like this:

@calcfunction
def structures_trajectory(**structures):
    return TrajectoryData([structure for structure in structures.values()])

structures = {
    'structure_a': StructureData(),
    'structure_b': StructureData(),
    'structure_c': StructureData(),
}

structures_trajectory(**structures)

With the proposed new syntax, one could do the following:

@calcfunction
def structures_trajectory(*structures):
    return TrajectoryData(structures)

structures = (StructureData(), StructureData())

structures_trajectory(*structures)

I personally think both are fine. As you can see the current existing solution is quite reasonable, it is just that quite a number of users were stumped to find out that varargs are not supported when that is possible in normal Python. I think it couldn't hurt to support the option for those who want it.

sphuber avatar Oct 07 '22 06:10 sphuber

I agree with Chris that the underlying feature people are looking for here is not so much variadic arguments but being able to work with, and seamlessly pass around compounds (lists, dicts, ...) of AiiDA objects.

That the provenance graph does not have a concept of a list of objects (or, in other words, that you can't have an orm.List of AiiDA objects) is quite a significant limitation of the AiiDA language that seems worth tackling.

As for the specific issue in this thread, given that we need to create named links for function inputs, I think requiring inputs to functions to be named is reasonable; I think it just needs to be well documented with examples for canonical use cases like the one above.

ltalirz avatar Oct 07 '22 10:10 ltalirz

As for the specific issue in this thread, given that we need to create named links for function inputs, I think requiring inputs to functions to be named is reasonable; I think it just needs to be well documented with examples for canonical use cases like the one above.

@ltalirz I think as the various responses from users testify, using varargs is something that people would naturally do. Even if it is documented somewhere that this is not supported with a counter-example based on kwargs, people are unlikely to find this beforehand and not run into this limitation.

I agree that ideally lists and dictionaries of inputs would be supported, but it is not clear how much work it would take to implement this. Is there a real objection to add this functionality in the meantime? Is there a real downside? Adding this now doesn't prevent us from adding direct support for lists and dictionaries later on, would it?

sphuber avatar Oct 07 '22 11:10 sphuber

Is there a real objection to add this functionality in the meantime?

My objection would be that the logic in the implementation and the labels it generates are non-obvious. Once it is in aiida-core, users can start relying on it, and it becomes difficult to change.

An alternative stop-gap solution would be to replace the error message variadic arguments are not supported with something more helpful like

Variadic arguments are not supported. Instead of a list `*args`, pass a dictionary `**kwargs`, whose keys will be used as link labels.

Anyhow, I'm not going to block the PR if people feel strongly that support for *args is needed.

ltalirz avatar Oct 07 '22 13:10 ltalirz

My objection would be that the logic in the implementation and the labels it generates are non-obvious. Once it is in aiida-core, users can start relying on it, and it becomes difficult to change.

Yeh this was kinda my feeling also 😬

chrisjsewell avatar Oct 07 '22 13:10 chrisjsewell

You could in principle use kwargs for your use case. You just have to create a dictionary with arbitrary keys as an input for your StructureData

Oh cool! I didn't know that! Thanks so much @sphuber Nevertheless, I'd say it is still weird to use a dictionary for passing in a list, imo weirder than having surprising link labels, but I don't know enough about how the link labels are generally used by users. In any case, arg_x seems like a very straightforward naming, and the potential clashes with positional arguments should be very rare, why would anybody name the positional argument as arg_0 for example?

An alternative stop-gap solution would be to replace the error message variadic arguments are not supported with something more helpful like

This would go a long way tbh, please implement this regardless of the result of the discussion here. :blush:

danielhollas avatar Oct 07 '22 14:10 danielhollas

I'd also say that calcfunctions, workfunctions are so much more lightweight than CalcJobs / WorkChains, so making them as useful and as painless to use is a goal worth pursuing.

danielhollas avatar Oct 07 '22 14:10 danielhollas

Anyhow, I'm not going to block the PR if people feel strongly that support for *args is needed.

For me it is also not crucial to have this functionality, I have personally never felt the need, but that is probably because I, as a developer, was aware of the limitation and the alternatives. I simply picked up the existing issue since I worked on the process functions anyway (for the automatic input serialization) and since there seemed to be multiple people asking for it, I just implemented it.

In any case, arg_x seems like a very straightforward naming, and the potential clashes with positional arguments should be very rare, why would anybody name the positional argument as arg_0 for example?

This was exactly my thinking. In most cases, people don't even really care about the generated link labels, but in this case, having args_0 etc. seemed perfectly logical to me. The exception in the case of a clash should indeed be super rare and practically never occur, but I put in the code to cover all the bases and not have exceptions in those rare cases.

Anyway, I think all has been said what needs to be said. I propose I quickly present this during the next AiiDA meeting and then we can see what other people think. If they share your concern, then I am perfectly fine with leaving it as is (with an update of the docs explaining this use-case and pointing to the **kwargs usage).

sphuber avatar Oct 07 '22 15:10 sphuber

Thanks @sphuber, and all for the feedback. Here is my take.

  • There are two issues.
    • One is allowing this behaviour (I see why people want it, there is an example above where this is useful and we don't really care of the label so I would support this PR (with the comment/change below)
    • Then, there is the issue of "list of AiiDA nodes" and "dicts of AiiDA nodes". I totally see the need, but this is more complex to fix (and only partially overlapping - in the case of @danielhollas, I think he would be fine with this PR and does not really need to define a node being the list of multiple StructureData nodes). Other cases instead would require this concept to be in the graph and immutable. So for this latter case, we should open another issue and investigate what needs to be changed (e.g., one of the many possible approaches: create a new new node type "DictOfNodes", and define a new link type "MemberOf" between data nodes - but we should really discuss in another issue).

So, I would be OK with merging this, with one change: I don't like names being created with UUID and in a non-reproducible/understandable way. I suggest that if a potential name clash is found (e.g. one has *args plus any other argument starting with args_ followed by a positive integer), we just raise (similar to what we are now doing with the message Variadic arguments are not supported), explaining the limitation. This should be enough to avoid unexpected behaviour, and at the same point to have the developed fix this at their first use of the (calc/work)function they just wrote. Probably nobody will ever see this exception, but if one sees it, it's a 5-second thing to change the variable name accordingly.

giovannipizzi avatar Oct 19 '22 09:10 giovannipizzi