pints icon indicating copy to clipboard operation
pints copied to clipboard

Allow setting parameter names for models?

Open chonlei opened this issue 5 years ago • 25 comments

Currently when we output parameters, either in pints.plot or in pints.io, we call parameters as "p0","p1",... or "parameter 1", "parameter 2", ...

First the numbering/naming is not consistent. Also maybe we can allow model to have

class ForwardModel(object):
    """
    Defines an interface for user-supplied forward models.
    Classes extending ``ForwardModel`` can implement the required methods
    directly in Python or interface with other languages (for example via
    Python wrappers around C code).
    """

    def __init__(self):
        super(ForwardModel, self).__init__()
        self.parameter_names = None
    
    ...

    def set_parameter_names(self, names):
        self.parameter_names = names

    def parameter_names(self):
        if self.parameter_names is None:
            return ['p' + str(i) for i in self.n_parameters()]
        else:
            return self.parameter_names

Or something similar, then when we output the parameters, we can refer to user definited parameter names.

Later, the user can still make sense of the parameter outputs without relying on the specific order of the parameters that they may have forgotten...

chonlei avatar May 02 '19 12:05 chonlei

Good idea. Would be nice if plots could use this too, I spend quite a lot of time going back and forth to work out what is what!

mirams avatar May 02 '19 13:05 mirams

Yeah, that's what I hope to get it to do too, so that both plotting with pints.plot or storing with pints.io can use the user definited parameter names.

chonlei avatar May 02 '19 13:05 chonlei

Sounds good, make a branch for it!

Would suggest something like this for the implementation though:

class ForwardModel(object):
    """
    Defines an interface for user-supplied forward models.
    Classes extending ``ForwardModel`` can implement the required methods
    directly in Python or interface with other languages (for example via
    Python wrappers around C code).
    """

    def parameter_names(self):
        return ['p' + str(i) for i in range(self.n_parameters())]

so that there's a default implementation that users can choose to override

MichaelClerx avatar May 02 '19 13:05 MichaelClerx

Looking at pints.io and pints.plot, they usually do not require the model as part of the input, e.g. we do pints.plot.trace(samples) which becomes independent of the model, so as for pints.io.save_samples etc. The only one that has model as part of the input is pints.plot.series.

So adding parameter names inside the model definition wouldn't help in this case...

@MichaelClerx @mirams Any suggestions? (My bad coding habit would be setting some global variable, which obviously is a big no no here...)

chonlei avatar May 05 '19 17:05 chonlei

Alright, so couple of ways to think about this

  1. Thinking in objects: The model object is the obvious candidate to own the variable names, right? So we could stick the names in the model, and then any function needing names would take an (optional) model argument

  2. Asking, "how can we hack this in with the least amount of hassle": This might lead you to just give a few methods a new argument parameter_names (a list of strings). Or your global hack, but then what do you do if you have two models? You could end up defining a VariableNames object, but then that's just another variable people need to keep track of...

Going down route 1 a bit further, what happens if you don't have a model, just a PDF? Perhaps the Problem classes and all Errors / LogLikelihoods should have a parameter_names() method as well! Then there's quite a few objects that have both n_parameters and parameter_names, so perhaps we should create an interface ParameterOwner that all these classes should implement? Then plotting methods could have an optional argument parameter_owner (or something snappier)

MichaelClerx avatar May 06 '19 07:05 MichaelClerx

Maybe ObjectWithParameters would be more descriptive

MichaelClerx avatar May 06 '19 07:05 MichaelClerx

This sounds related to the idea about parameters knowing whether they are transformed or not, so possible to make parameters into a smarter but still lightweight class that knows names too.

But in the interests of simplicity, optional argument of parameter_names which user can get based on the model's definition sounds easiest. If we also want the transforms available we can revisit?

mirams avatar May 07 '19 06:05 mirams

I did like @MichaelClerx 's route 1, but the only thing I worried was it might be a bit overkill? (I think PyMC3 has a similar structure for the model parameters, and that combines with the sampling output, so that the output itself has parameter names attached to it already -- which I am not too sure if it is necessary here?)

And yes, for the interests of simplicity, I guess, route 2's

This might lead you to just give a few methods a new argument parameter_names (a list of strings).

is the easiest -- but user has to put that extra parameter_names argument everywhere...

chonlei avatar May 07 '19 18:05 chonlei

Going down route 1 a bit further, what happens if you don't have a model, just a PDF? Perhaps the Problem classes and all Errors / LogLikelihoods should have a parameter_names() method as well! Then there's quite a few objects that have both n_parameters and parameter_names, so perhaps we should create an interface ParameterOwner that all these classes should implement? Then plotting methods could have an optional argument parameter_owner (or something snappier)

This can get quite complicated but useful at the same time, e.g. all the outputs in PINTS can somehow share that and all the output can have parameter names (and all these kinds of information). However the problem is, which I didn't like and I am still not sure if I like it, the output is no longer a simple numpy array that everyone knows... it would become some object of PINTS, and user has to learn how to access to the actual result -- i.e. useful but a steeper learning curve for PINTS(?).

chonlei avatar May 07 '19 18:05 chonlei

Just had a look at pints.io and won't be easy to do that there*, so just start with giving the plotting methods an optional argument parameter_names

*Because we've got def save_samples(filename, *sample_lists): so you can't add a keyword argument without making it backwards compatible

MichaelClerx avatar May 07 '19 18:05 MichaelClerx

This can get quite complicated but useful at the same time, e.g. all the outputs in PINTS can somehow share that and all the output can have parameter names (and all these kinds of information). However the problem is, which I didn't like and I am still not sure if I like it, the output is no longer a simple numpy array that everyone knows... it would become some object of PINTS, and user has to learn how to access to the actual result -- i.e. useful but a steeper learning curve for PINTS(?).

No the output would just a numpy array, with numbers.

But you could get names for each of those numbers from ParamOwner.parameter_names()

But let's not go there just now!

MichaelClerx avatar May 07 '19 18:05 MichaelClerx

so just start with giving the plotting methods an optional argument parameter_names

@chonlei do you intend to pick this up again?

MichaelClerx avatar Sep 24 '19 14:09 MichaelClerx

Happy to, just wasn't sure if we have had a clear conclusion what to do yet...

chonlei avatar Sep 28 '19 19:09 chonlei

Two options:

  1. If you really want it: add some options to the plotting functions to also accept a list of parameter names. Add an optional method parameter_names() to ToyModel and ToyLogPDF, or maybe even to ForwardModel and LogPDF.
  2. If you're not that interested anymore, just close the ticket

Up to you :D

MichaelClerx avatar Sep 28 '19 19:09 MichaelClerx

I like the parameter names idea! Especially if/when we make a sampling result object...

On 28 Sep 2019, at 20:24, Michael Clerx [email protected] wrote:

Two options:

If you really want it: add some options to the plotting functions to also accept a list of parameter names. Add an optional method parameter_names() to ToyModel and ToyLogPDF, or maybe even to ForwardModel and LogPDF. If you're not that interested anymore, just close the ticket Up to you :D

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ben18785 avatar Sep 28 '19 21:09 ben18785

Stop it with the results object :D :D We can do anything a results object does with appropriate methods in the controller. There's examples already for optimisation, and several other issues where we discuss this!

MichaelClerx avatar Sep 28 '19 21:09 MichaelClerx

OK, I'm deciding this one as "we're doing parameter names as optional inputs to plots and diagnostic printing methods".

MichaelClerx avatar Mar 19 '20 10:03 MichaelClerx

If we did add a parameter_names method to models at some point, it's worth noting that this wouldn't include names from the error model, so we'd have to have a parameter_names() on the Problem classes as well (that can ask a model if it has one), and then on the ProblemLogPDF and ProblemBasedError classes

MichaelClerx avatar May 01 '20 07:05 MichaelClerx

You know what this is still a nice idea :D

MichaelClerx avatar Jun 08 '21 15:06 MichaelClerx

After 2 whole years? 😄 Has anyone started working on it? I suppose I'd be asking @DavAug?

chonlei avatar Jun 17 '21 04:06 chonlei

@MichaelClerx @chonlei Yes I agree! Haha I absolutely did not. 😂 But I did implement this in erlotinib, where I am basically doing exactly what Michael suggests. Models can specify names, if they are not provided the parameters are just enumerated. LogLikelihoods have names for their parameters (in my case I have an intermediate ErrorModel class), so when a LogLikelihood is instantiated it will know the names of all parameters.

By implementing a get_parameter_names method for loglikelihoods, we will always be able to identify the parameters and worst case scenario the names are the somewhat meaningless defaults: `['parameter 1', 'parameter 2', ..., 'parameter k', 'noise parameter 1', etc. ]'.

I don't think I'll have time to do this any time soon, but it seems like a very good beginners issue for someone who would liek to start working on pints.

DavAug avatar Jun 17 '21 09:06 DavAug

So just to clarify what I am suggesting is to make the get_parameter_names method mandatory for the ForwardModel's (potentially define a default method in the base classes that just enumerates the parameter names, so users don't have to worry about it, if they don't want to). All classes that work with the ForwardModel can then derive the parameter names from the ForwardModel provided that the LogLikelihoods know what to call their additional parameters.

DavAug avatar Jun 17 '21 09:06 DavAug

That's cool - yeah very similar to what was suggested before (see the top of this thread).

I was also hoping if we could make it a bit more sophisticated that plotting (e.g. the traces, histograms, and pairwise plots) can take these ForwardModel parameter names as well... But that's harder because plotting methods take only the samples (numpy array) but not anything related to ForwardModel. So back to the argument of going for Route 1 or Route 2 above, but I guess now that the plotting methods take parameter names separately (making things a bit disconnected), we'd just go for Route 2. Although I suppose if we do want to go half way from Route 2 to Route 1, we could do something like subclassing ndarray, a structured array, or a pandas data structure for storing the parameter names when outputting the samples from the controller (which can be passed to the plotting methods)...

chonlei avatar Jun 17 '21 10:06 chonlei

I know that arviz uses xarray as data structure which does exactly what you say. The controller could then optionally return a xarray.DataArray or xarray.Dataset instead of a numpy array. Ultimately it would then make sense to refactor the plots or just entirely outsource this to arviz.

DavAug avatar Jun 17 '21 12:06 DavAug

For plots, an easy option would be to allow objects like a model to be passed in as the parameter_names argument. Perhaps we'd have a subclass ParameterOwner or something like that, that could be extended by models, score functions, etc.

MichaelClerx avatar Jun 17 '21 12:06 MichaelClerx