pytorch-lightning Whether clarification/documentation/redesign is needed for customizing LightningCLI subcommands

The LightningCLI documentation does not mention any case in which trainer methods (CLI subcommands) are customized. It would be good to include it so that people are more aware of a recommended way to do this.

On the other hand, the LightningCLI class currently has an undocumented feature that supports the implementation of methods with the same name as the Trainer class, see https://github.com/PyTorchLightning/pytorch-lightning/pull/11714#discussion_r798721175. Despite that discussion, still to me it seems weird that return_predictions option is there by default. It does nothing unless you implement the LightningCLI.predict. Documenting this would not change this weirdness. Furthermore, if documented it would seem that this is the recommended way to customize a Trainer method. But this has the limitation that parameters added e.g. to LightningCLI.predict, would not be exposed as command line options. I think there is a need for some redesign and only document once it is clear what are the recommended ways to implement the most common use cases. This github issue could be used for the discussion about this posible redesign.

I am not convinced that it is a good idea that to customize Trainer methods, people implement methods with the same name in the LightningCLI class. I think a better way to do this is to subclass Trainer and pass this to the trainer_class init parameter of LightningCLI. One reason I meantioned before is that new parameters of the method would become CLI options. Another reason is that these kind of customizations can be useful without using LightningCLI, so recommending to implement this logic in a subclass of LightningCLI would go against it. An one more reason, it is more natural to customize the method where it was originally defined than in a different class.

Another thought I have is that by default, as far as possible, all LightningCLI subcommands and options should do something obvious/useful, and not have options that don't do anything. What could be the default behavior for predict? I haven't used predict, so I could be misinterpreting its purpose. But I would expect as default one of two possibilities. The first is the predictions are always printed to stdout. This would imply that return_predictions is not a CLI option. The second possibility is that predict has an --output option which could support stdout, a local file or even some remote location. Again return_predictions wouldn't be a CLI option, though internally its value could depend on the --output value. If someone wants a behavior different from the default, then some implementation is needed.

To anayze whether a redesign is needed and come up with a good solution, would be good to list a few use cases and discuss how this could/should be implemented. I give a few that come to mind.

Implement predict such that it is possible to select the output format, e.g. json or pickle.
Add prediction parameters which are not related to model training. Examples:
- Whether the output should or not include confidences.
- For a classification task, to output the best class or all classes ordered by decreasing probability.
- For some decoding task, set the number of n-best predictions.
CLI options to specify how to and/or where to save the predictions in the case that return_predictions=False.

cc @borda @rohitgr7 @tchaton @justusschock @awaelchli @carmocca @mauvilsa

Mar 24 '22 06:03 mauvilsa

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Apr 24 '22 04:04 stale[bot]

@carmocca any comment about these thoughts?

Apr 24 '22 06:04 mauvilsa

It would be good to include it so that people are more aware of a recommended way to do this.

Fully agree.

am not convinced that it is a good idea that to customize Trainer methods, people implement methods with the same name in the LightningCLI class. I think a better way to do this is to subclass Trainer and pass this to the trainer_class init parameter of LightningCLI

I generally agree. Implementing such methods was useful before the addition of run=False. Before that, it was the only way to control the execution of the trainer methods.

If we remove these CLI methods, then removing return_predictions from the CLI arguments makes sense.

would be good to list a few use cases and discuss how this could/should be implemented. I give a few that come to mind.

I think such changes fit better a different issue as they need to make sense unrelated to predict's use with the LightningCLI

Apr 29 '22 16:04 carmocca