liesel
liesel copied to clipboard
Ideas for the next level of the DistRegBuilder
What we want to do:
- [ ] Move distregbuilder to its own module
- [ ] Move some
liesel.model.legacy
functionality over to the new distregbuilder module - [ ] rename
add_predictor
(see below) - [ ] make predictor list public
- [ ] increase modularity by the use of groups
Some notes:
- There's still some usage of
liesel.legacy
helper functions in the distreg module, such asliesel.legacy.Predictor
. I'd like to discuss these in https://github.com/liesel-devs/liesel-internal/issues/252, so I would like to wait with removing their current usage until then. - Increase modularity by creating specialized groups for model components. This is something that I am confident make sense.
- Experiment with even more declarative ways to define a model. This is more in the category of playing around.
- Allow the Distregbuilder to be initialized with a data-mapping (which may be, for instance, a
dict
or apd.DataFrame
). Then the model-building methods can refer to arrays in the mapping by their key. -
add_predictor
➡️add_distparam
or something similar - Move distregbuilder to its own module
- Move some
liesel.model.legacy
functionality over to the new distregbuilder module
Example for (2)
This group bundles nodes for representing a variance parameter with an InverseGamma prior. Some things to keep in mind:
- It might be nice to create a meta-structure for some "families of groups". For example, there could be a meta-class for groups representing variance parameters. This meta-class could enforce that the variance-parameter-group must have a
var_param
property. Then other functions/groups could just accept a generic variance-parameter group and be certain that the variance parameter will be available under the right name. - Specialized groups could make some information readily available for goose. For example, there could be a method
Group.gibbs_kernels()
, returning a list of Gibbs Kernels for the parameters in this group. There could also be a list of the names of the parameters in the group that need to be sampled.
class IGVariance(Group):
"""
A variance parameter with an inverse gamma prior.
Parameters
----------
name
Name of the group.
a
Concentration or rate parameter of the inverse gamma prior.
b
Scale parameter of the inverse gamma prior.
start_value
Initial value of the variance parameter.
"""
def __init__(self, name: str, a: float, b: float, start_value: float = 100.0):
self.a = Var(a, name=name + "_a")
"""Concentration/rate parameter of the inverse gamma prior."""
self.b = Var(b, name=name + "_b")
"""Scale parameter of the inverse gamma prior."""
prior = Dist(tfd.InverseGamma, concentration=self.a, scale=self.b)
self.var_param = Param(start_value, prior, name + "_var_param")
"""The variance parameter."""
super().__init__(name=name, var_param=self.var_param, a=self.a, b=self.b)
Example for (3)
One possibility for (3) would be to make use of the __iadd__
special method. This would enable us to write something like the following:
drb = dr.DistRegBuilder()
drb += dr.Response(y, tfd.Normal)
drb += dr.Predictor("loc", tfb.Identity)
drb.loc += dr.Intercept()
drb.loc += dr.NPSmooth(...)
Example for (4)
y = np.random.normal(size=100)
x = np.random.uniform(size=100)
data = {"y": y, "x": x}
drb = dr.DistRegBuilder(data)
drb.add_response("y", distribution=tfd.Normal)
...
drb.add_p_smooth("x", m=0.0, s=10.0)
Just a couple of minor things:
- I would not move the
legacy
module. It is not strictly related todistreg
; - I like the name
add_predictor
but as @wiep said, I don't know if it is a good idea to nameadd_distparam
.
Thanks for the comments @GianmarcoCallegher ! :)
- I would not move the
legacy
module. It is not strictly related todistreg
;
I think some of its functionality is more closely related than others - for example SmoothingParam
is more closely related, while Addition
is less closely related. But SmoothingParam
would probably not continue to be used in up-to-date code. I guess we do need to pay a bit more attention to how we deal with this functionality instead of simply moving it all over.
My main issue with the legacy module is that it has the air of deprecation and being there only for backwards-compatibility. But some things are just nice to have like the Predictor
helper. A lot of others probably do not need to be transferred.
- I like the name
add_predictor
but as @wiep said, I don't know if it is a good idea to nameadd_distparam
.
I see the point and am also ok with leaving it as it is.
i wonder if this a good time to create the contrib module and place distreg there