pints
pints copied to clipboard
Allow setting parameter names for models?
Currently when we output parameters, either in pints.plot
or in pints.io
, we call parameters as "p0","p1",... or "parameter 1", "parameter 2", ...
First the numbering/naming is not consistent. Also maybe we can allow model to have
class ForwardModel(object):
"""
Defines an interface for user-supplied forward models.
Classes extending ``ForwardModel`` can implement the required methods
directly in Python or interface with other languages (for example via
Python wrappers around C code).
"""
def __init__(self):
super(ForwardModel, self).__init__()
self.parameter_names = None
...
def set_parameter_names(self, names):
self.parameter_names = names
def parameter_names(self):
if self.parameter_names is None:
return ['p' + str(i) for i in self.n_parameters()]
else:
return self.parameter_names
Or something similar, then when we output the parameters, we can refer to user definited parameter names.
Later, the user can still make sense of the parameter outputs without relying on the specific order of the parameters that they may have forgotten...
Good idea. Would be nice if plots could use this too, I spend quite a lot of time going back and forth to work out what is what!
Yeah, that's what I hope to get it to do too, so that both plotting with pints.plot
or storing with pints.io
can use the user definited parameter names.
Sounds good, make a branch for it!
Would suggest something like this for the implementation though:
class ForwardModel(object):
"""
Defines an interface for user-supplied forward models.
Classes extending ``ForwardModel`` can implement the required methods
directly in Python or interface with other languages (for example via
Python wrappers around C code).
"""
def parameter_names(self):
return ['p' + str(i) for i in range(self.n_parameters())]
so that there's a default implementation that users can choose to override
Looking at pints.io
and pints.plot
, they usually do not require the model as part of the input, e.g. we do pints.plot.trace(samples)
which becomes independent of the model, so as for pints.io.save_samples
etc. The only one that has model
as part of the input is pints.plot.series
.
So adding parameter names inside the model definition wouldn't help in this case...
@MichaelClerx @mirams Any suggestions? (My bad coding habit would be setting some global variable, which obviously is a big no no here...)
Alright, so couple of ways to think about this
-
Thinking in objects: The model object is the obvious candidate to own the variable names, right? So we could stick the names in the model, and then any function needing names would take an (optional)
model
argument -
Asking, "how can we hack this in with the least amount of hassle": This might lead you to just give a few methods a new argument
parameter_names
(a list of strings). Or your global hack, but then what do you do if you have two models? You could end up defining aVariableNames
object, but then that's just another variable people need to keep track of...
Going down route 1 a bit further, what happens if you don't have a model, just a PDF? Perhaps the Problem
classes and all Errors / LogLikelihoods should have a parameter_names()
method as well! Then there's quite a few objects that have both n_parameters
and parameter_names
, so perhaps we should create an interface ParameterOwner
that all these classes should implement? Then plotting methods could have an optional argument parameter_owner
(or something snappier)
Maybe ObjectWithParameters
would be more descriptive
This sounds related to the idea about parameters knowing whether they are transformed or not, so possible to make parameters into a smarter but still lightweight class that knows names too.
But in the interests of simplicity, optional argument of parameter_names
which user can get based on the model's definition sounds easiest. If we also want the transforms available we can revisit?
I did like @MichaelClerx 's route 1, but the only thing I worried was it might be a bit overkill? (I think PyMC3 has a similar structure for the model parameters, and that combines with the sampling output, so that the output itself has parameter names attached to it already -- which I am not too sure if it is necessary here?)
And yes, for the interests of simplicity, I guess, route 2's
This might lead you to just give a few methods a new argument parameter_names (a list of strings).
is the easiest -- but user has to put that extra parameter_names
argument everywhere...
Going down route 1 a bit further, what happens if you don't have a model, just a PDF? Perhaps the Problem classes and all Errors / LogLikelihoods should have a parameter_names() method as well! Then there's quite a few objects that have both n_parameters and parameter_names, so perhaps we should create an interface ParameterOwner that all these classes should implement? Then plotting methods could have an optional argument parameter_owner (or something snappier)
This can get quite complicated but useful at the same time, e.g. all the outputs in PINTS can somehow share that and all the output can have parameter names (and all these kinds of information). However the problem is, which I didn't like and I am still not sure if I like it, the output is no longer a simple numpy array that everyone knows... it would become some object of PINTS, and user has to learn how to access to the actual result -- i.e. useful but a steeper learning curve for PINTS(?).
Just had a look at pints.io
and won't be easy to do that there*, so just start with giving the plotting methods an optional argument parameter_names
*Because we've got def save_samples(filename, *sample_lists):
so you can't add a keyword argument without making it backwards compatible
This can get quite complicated but useful at the same time, e.g. all the outputs in PINTS can somehow share that and all the output can have parameter names (and all these kinds of information). However the problem is, which I didn't like and I am still not sure if I like it, the output is no longer a simple numpy array that everyone knows... it would become some object of PINTS, and user has to learn how to access to the actual result -- i.e. useful but a steeper learning curve for PINTS(?).
No the output would just a numpy array, with numbers.
But you could get names for each of those numbers from ParamOwner.parameter_names()
But let's not go there just now!
so just start with giving the plotting methods an optional argument parameter_names
@chonlei do you intend to pick this up again?
Happy to, just wasn't sure if we have had a clear conclusion what to do yet...
Two options:
- If you really want it: add some options to the plotting functions to also accept a list of parameter names. Add an optional method
parameter_names()
to ToyModel and ToyLogPDF, or maybe even to ForwardModel and LogPDF. - If you're not that interested anymore, just close the ticket
Up to you :D
I like the parameter names idea! Especially if/when we make a sampling result object...
On 28 Sep 2019, at 20:24, Michael Clerx [email protected] wrote:
Two options:
If you really want it: add some options to the plotting functions to also accept a list of parameter names. Add an optional method parameter_names() to ToyModel and ToyLogPDF, or maybe even to ForwardModel and LogPDF. If you're not that interested anymore, just close the ticket Up to you :D
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Stop it with the results object :D :D We can do anything a results object does with appropriate methods in the controller. There's examples already for optimisation, and several other issues where we discuss this!
OK, I'm deciding this one as "we're doing parameter names as optional inputs to plots and diagnostic printing methods".
If we did add a parameter_names
method to models at some point, it's worth noting that this wouldn't include names from the error model, so we'd have to have a parameter_names() on the Problem
classes as well (that can ask a model if it has one), and then on the ProblemLogPDF
and ProblemBasedError
classes
You know what this is still a nice idea :D
After 2 whole years? 😄 Has anyone started working on it? I suppose I'd be asking @DavAug?
@MichaelClerx @chonlei Yes I agree! Haha I absolutely did not. 😂 But I did implement this in erlotinib, where I am basically doing exactly what Michael suggests. Models can specify names, if they are not provided the parameters are just enumerated. LogLikelihoods have names for their parameters (in my case I have an intermediate ErrorModel class), so when a LogLikelihood is instantiated it will know the names of all parameters.
By implementing a get_parameter_names
method for loglikelihoods, we will always be able to identify the parameters and worst case scenario the names are the somewhat meaningless defaults: `['parameter 1', 'parameter 2', ..., 'parameter k', 'noise parameter 1', etc. ]'.
I don't think I'll have time to do this any time soon, but it seems like a very good beginners issue for someone who would liek to start working on pints.
So just to clarify what I am suggesting is to make the get_parameter_names
method mandatory for the ForwardModel's (potentially define a default method in the base classes that just enumerates the parameter names, so users don't have to worry about it, if they don't want to). All classes that work with the ForwardModel can then derive the parameter names from the ForwardModel provided that the LogLikelihoods know what to call their additional parameters.
That's cool - yeah very similar to what was suggested before (see the top of this thread).
I was also hoping if we could make it a bit more sophisticated that plotting (e.g. the traces, histograms, and pairwise plots) can take these ForwardModel parameter names as well... But that's harder because plotting methods take only the samples (numpy array) but not anything related to ForwardModel. So back to the argument of going for Route 1 or Route 2 above, but I guess now that the plotting methods take parameter names separately (making things a bit disconnected), we'd just go for Route 2. Although I suppose if we do want to go half way from Route 2 to Route 1, we could do something like subclassing ndarray, a structured array, or a pandas data structure for storing the parameter names when outputting the samples from the controller (which can be passed to the plotting methods)...
I know that arviz uses xarray as data structure which does exactly what you say. The controller could then optionally return a xarray.DataArray or xarray.Dataset instead of a numpy array. Ultimately it would then make sense to refactor the plots or just entirely outsource this to arviz.
For plots, an easy option would be to allow objects like a model to be passed in as the parameter_names
argument. Perhaps we'd have a subclass ParameterOwner
or something like that, that could be extended by models, score functions, etc.