liesel icon indicating copy to clipboard operation
liesel copied to clipboard

Ideas for the next level of the DistRegBuilder

Open jobrachem opened this issue 1 year ago • 3 comments

What we want to do:

  • [ ] Move distregbuilder to its own module
  • [ ] Move some liesel.model.legacy functionality over to the new distregbuilder module
  • [ ] rename add_predictor (see below)
  • [ ] make predictor list public
  • [ ] increase modularity by the use of groups

Some notes:

  1. There's still some usage of liesel.legacy helper functions in the distreg module, such as liesel.legacy.Predictor. I'd like to discuss these in https://github.com/liesel-devs/liesel-internal/issues/252, so I would like to wait with removing their current usage until then.
  2. Increase modularity by creating specialized groups for model components. This is something that I am confident make sense.
  3. Experiment with even more declarative ways to define a model. This is more in the category of playing around.
  4. Allow the Distregbuilder to be initialized with a data-mapping (which may be, for instance, a dict or a pd.DataFrame). Then the model-building methods can refer to arrays in the mapping by their key.
  5. add_predictor ➡️ add_distparam or something similar
  6. Move distregbuilder to its own module
  7. Move some liesel.model.legacy functionality over to the new distregbuilder module

Example for (2)

This group bundles nodes for representing a variance parameter with an InverseGamma prior. Some things to keep in mind:

  • It might be nice to create a meta-structure for some "families of groups". For example, there could be a meta-class for groups representing variance parameters. This meta-class could enforce that the variance-parameter-group must have a var_param property. Then other functions/groups could just accept a generic variance-parameter group and be certain that the variance parameter will be available under the right name.
  • Specialized groups could make some information readily available for goose. For example, there could be a method Group.gibbs_kernels(), returning a list of Gibbs Kernels for the parameters in this group. There could also be a list of the names of the parameters in the group that need to be sampled.
class IGVariance(Group):
    """
    A variance parameter with an inverse gamma prior.

    Parameters
    ----------
    name
        Name of the group.
    a
        Concentration or rate parameter of the inverse gamma prior.
    b
        Scale parameter of the inverse gamma prior.
    start_value
        Initial value of the variance parameter.
    """

    def __init__(self, name: str, a: float, b: float, start_value: float = 100.0):

        self.a = Var(a, name=name + "_a")
        """Concentration/rate parameter of the inverse gamma prior."""

        self.b = Var(b, name=name + "_b")
        """Scale parameter of the inverse gamma prior."""

        prior = Dist(tfd.InverseGamma, concentration=self.a, scale=self.b)
        self.var_param = Param(start_value, prior, name + "_var_param")
        """The variance parameter."""

        super().__init__(name=name, var_param=self.var_param, a=self.a, b=self.b)


Example for (3)

One possibility for (3) would be to make use of the __iadd__ special method. This would enable us to write something like the following:

drb = dr.DistRegBuilder()

drb += dr.Response(y, tfd.Normal)
drb += dr.Predictor("loc", tfb.Identity)

drb.loc += dr.Intercept()
drb.loc += dr.NPSmooth(...)

Example for (4)

y = np.random.normal(size=100)
x = np.random.uniform(size=100)
data = {"y": y, "x": x}

drb = dr.DistRegBuilder(data)

drb.add_response("y", distribution=tfd.Normal)
...
drb.add_p_smooth("x", m=0.0, s=10.0)

jobrachem avatar Mar 15 '23 12:03 jobrachem

Just a couple of minor things:

  • I would not move the legacy module. It is not strictly related to distreg;
  • I like the name add_predictor but as @wiep said, I don't know if it is a good idea to name add_distparam.

GianmarcoCallegher avatar Mar 22 '23 18:03 GianmarcoCallegher

Thanks for the comments @GianmarcoCallegher ! :)

  • I would not move the legacy module. It is not strictly related to distreg;

I think some of its functionality is more closely related than others - for example SmoothingParam is more closely related, while Addition is less closely related. But SmoothingParam would probably not continue to be used in up-to-date code. I guess we do need to pay a bit more attention to how we deal with this functionality instead of simply moving it all over.

My main issue with the legacy module is that it has the air of deprecation and being there only for backwards-compatibility. But some things are just nice to have like the Predictor helper. A lot of others probably do not need to be transferred.

  • I like the name add_predictor but as @wiep said, I don't know if it is a good idea to name add_distparam.

I see the point and am also ok with leaving it as it is.

jobrachem avatar Mar 22 '23 20:03 jobrachem

i wonder if this a good time to create the contrib module and place distreg there

wiep avatar Mar 22 '23 20:03 wiep