jsonargparse CLI: Allow arguments of another function as a group

🚀 Feature request

Support the ability to configure a function's arguments as a group. I will give an example of how this might look, to explain it further, I'm not tied to the implementation:

Provide a helper function function_signature_type(f: Callable) that will inspect f, and return a dynamically defined dataclass f_Args that will contain the parameters of the function as properties. The user will then do

def f(foo: str):
   ....


def main(f_args: function_signature_type(f)):
    ...


CLI(main)

Which will support --f_args.foo = "hello"

Motivation

If I'm doing CLI(main), but I want the arguments of a function named helper to be passed in from the cli, currently I have to declare them all as arguments to main, or manually define a data class to hold them and declare a single argument with that type.

But this can be done dynamically:


def function_signature_type(func):
    sig = inspect.signature(func)

    fields = {}
    for param in sig.parameters.values():
        if param.kind == inspect.Parameter.POSITIONAL_OR_KEYWORD:
            if param.default == inspect.Parameter.empty:
                fields[param.name] = param.annotation
            else:
                fields[param.name] = type(param.default)
        elif param.kind == inspect.Parameter.POSITIONAL_ONLY:
            fields[param.name] = param.annotation
        elif param.kind == inspect.Parameter.KEYWORD_ONLY:
            fields[param.name] = type(param.default)

    return make_dataclass(f'{func.__name__}_Args', fields)

Mar 24 '23 19:03 indigoviolet

A somewhat-related question: Currently, if I annotate an argument as a class FooClass, jsonargparse will give me an instance of FooClass. I have a use case where I actually want the "serialized" arguments to construct that instance, and then create that instance later -- to clarify further, see:

def remote_fn(foo_args):
     foo = foo_args()
     
def main(foo: FooClass):
      # something like remote_fn(foo)

In this example, remote_fn will actually end up executing on a different machine/process, so I can't pass it foo directly, I need to serialize it again into foo_args and instantiate it remotely somehow. But jsonargparse is already handling that process, so it would be nice if it could give the serialized version and a way to deserialize it.

Mar 24 '23 23:03 indigoviolet

Thank you for the proposal!

Given the related question, it seems you just want more control. My suggestion would be to not use the CLI function. CLI is not intended for arbitrarily complex use cases. Instead you should implement the creation of the parser and manually parse, instantiate, call functions, etc. For example, you can do:

from jsonargparse import ArgumentParser

parser = ArgumentParser()
parser.add_function_arguments(f, 'f_args')
parser.add_subclass_arguments(FooClass, 'foo')

cfg = parser.parse_args()
...
init = parser.instantiate_classes(cfg)  # instantiate all classes in cfg if needed
...
# in cfg.foo you have the parameters used to instantiate foo
...
f(**init.f_args)  # call the f function using the parsed cfg.f_args

Mar 26 '23 17:03 mauvilsa

Thanks for considering and explaining.

It would be really convenient if there were a more gradual path from the one-liner of CLI with nice "centralized" annotations to a full-blown ArgumentParser with everything having to be written out twice.

One idea in that direction would be to provide a way to control how the ArgumentParser inside CLI is affected by a particular type annotation in main , perhaps some "callback hooks" I could implement just for f_args and FooClass, which would do .add_function_arguments and later get passed in with the parsed data. Here's a very rough sketch what I'm imagining:

class FooContainer(CustomType):
     def configure_parser(parser, arg_name):
            parser.add_subclass_arguments(FooClass, arg_name)

@dataclass 
class FunctionArguments(CustomType):
      fn: Callable
	  def configure_parser(parser, arg_name):
            parser.add_function_arguments(self.fn, arg_name)


def main(foo: FooContainer, f_args: FunctionArguments(f)):
        # foo would be passed in as a FooContainer, and foo.params would be the original params
        # f_args would be passed in as a FunctionArguments, and f_args.params would be parsed

if __name__ == '__main__':
   CLI(main)

Mar 28 '23 22:03 indigoviolet

Indeed CLI is nice because of the "centralized" annotations. However, why do you say that ArgumentParser requires to write things twice? The methods to add arguments from function/class signatures are precisely so that there is no need to duplicate.

An alternative, if you want to have a main function with centralized annotations and avoid manually creating the parser, but have access to the parsed values, you could do:

from jsonargparse import CLI, capture_parser

parser = capture_parser(lambda: CLI(main))
cfg = parser.parse_args()
...
init = parser.instantiate_classes(cfg)  # instantiate all classes in cfg if needed
...
# in cfg.foo you have the parameters used to instantiate foo
...
f(**init.f_args)  # call the f function using the parsed cfg.f_args

It is great to read your proposals and get a glimpse of what people are thinking and struggling with. There are some reasons why I am not particularly fond of these configure_parser methods. It is not too easy to explain, but will try. But please, don't stop proposing ideas because of this.

While designing new features for jsonargparse, I have been following some principles with the objective that people using jsonargparse are guided to write good code. One of the principles is separation of concerns. Ideally a command line tool should be possible to implement in a single python file, without need to add jsonargparse related logic in any other place. Furthermore, the idea is that it should be possible make configurable third party code imported from a different package. For example, one can do:

from jsonargparse import CLI
from third.party.package import func1, func2, func3

if __name__ == '__main__':
   CLI([func1, func2, func3])

Type annotations in third.party.package should make sense exclusively for that package. If magic methods such as configure_parser are supported, it is impossible to prevent people from implementing them in third party packages. Thus I will be quite hesitant to add command line parsing logic in type hints. Type annotations in non-command line parsing modules should: be unaware of the existence of jsonargpars, just follow standard python practices and be correct according to type checkers, such as mypy and pyright. Also I guess FunctionArguments(f) would be problematic for type checkers.

Apr 01 '23 06:04 mauvilsa

Thanks for the thoughtful explanation.

I agree that the types don't actually have to be written out twice, but it's still unfortunate that there is a discontinuity in CLI's utility -- you suddenly have to switch to another API and recall how that worked. CLI is great for quick scripts where you focus on the script's functionality, so it's nice not to have to switch to ArgumentParser.

That said, I agree that we want to keep the type checkers happy and not introduce weird coding practices.

Regarding the configure_parser objection you have raised, I'm not sure I understand the third-party package argument: why would someone making a third party package try to implement these methods? The analogue today would be a third-party package implementing functions that took an ArgumentParser and called add_method_arguments etc, but I'm hard pressed to see why a a package author would be configuring how command-line options are parsed while using their package. Especially since that would tie them to one argument parsing library.

You're of course right about FunctionArguments(f) not being a valid type annotation as described above, but I think that could be overcome doing something like this:

def FunctionArguments(f) -> CustomType:
	  def configure_parser(parser, arg_name):
            parser.add_function_arguments(f, arg_name)
      return CustomType(configure_parser=configure_parser)

Most of this is aiming to provide a gradual path from CLI to full ArgumentParser control; but perhaps the complexity of such configure_parser functions will already outweigh the cost of transitioning to the more straightforward ArgumentParser APIs.

We can table this idea now that the general thought is in your head, maybe if there is enough demand in this direction you can come back to it.

Apr 04 '23 00:04 indigoviolet

I'm not sure I understand the third-party package argument

I think my explanation gave too much focus on on third-party. Let me explain further. The point is separation of concerns. If we have

from jsonargparse import CLI
from some.other.module import func1, func2, func3

if __name__ == '__main__':
   CLI([func1, func2, func3])

there are two main cases:

some.other.module is part of some other project maintained by different people from the one who develops the CLI. To use configure_parser the CLI developer could think that it is required to modify some.other.module and invest time in trying it. If the maintainers of some.other.module are persuaded and accept CLI parsing logic there, then separation of concerns is not followed which is bad. If the maintainers are not persuaded, still bad because it is a waste of people's time.
some.other.module is part of the same project, in which case the CLI developer is way more likely to decide and be allowed to modify some.other.module. But even if this is in the same project, the separation of concerns principle still applies. Within a single project, the CLI parsing logic should be self contained and not spread out in a bunch of modules whose purpose have nothing to do with CLI parsing.

Most of this is aiming to provide a gradual path from CLI to full ArgumentParser control

The alternative that I gave parser = capture_parser(lambda: CLI(main)) is not good enough? The only additional knowledge needed is parse_args() and maybe also instantiate_classes(). This isn't much. It could be added to the documentation where the CLI function is explained.

Apr 04 '23 10:04 mauvilsa