feature request: Function to get distributional from family
Hi!
I found your package and have been enjoying it!
In my use case, I would like to be able to get a distributional object corresponding to a specified family. I have written a small piece of code (see below) within my package to enable this, but I wondered if it wouldn't be of interest to have implemented in the package.
What do you think?
get_distributional <- function(object) {
UseMethod("get_distributional")
}
get_distributional.default <- function(object) {
switch(object,
gaussian = getFromNamespace("dist_normal", ns = "distributional"),
binomial = getFromNamespace("dist_binomial", ns = "distributional"),
poisson = getFromNamespace("dist_poisson", ns = "distributional"),
stop("Unknown family: ", family))
}
get_distributional.family <- function(object) {
object <- object$family
NextMethod("get_distributional")
}```
Are you trying to obtain the function corresponding to a given family, or characterise a complete distribution?
I can see some value in having an as_distributional(family(), parameters()) function to restructure many distributions when loading them from a file.
How do you use the get_distributional() function in your packages?
Thanks for the quick response.
I was messing around with doing some likelihood calculation "manually" while doing some MCMC sampling in connection with some bayesian GLM. In that task I have a function with usual formula, data, family arguments and wanted to use my specification of family to determine the density I should use for the likelihood calculation without needing a bunch of ifs for each family.
However, for this use case I still need to reparametrise each density to accomodate the evaluation of it for my likelihood, meaning I think it will actually be easier to just revert to simple if checks. And if I do go this route, I might have to create custom functions for the reparametrisation anyway and would not have use of the as_distributional function.
So to answer your question: Yes, I am trying to characterise the distribution, but from a bit more pondering I don't think I would end up using the implementation for my current purpose. That said, an as_distributional S3 generic could probably come in handy in some scenarios, so it's up to you if you want to close this issue or leave it for some time in the future :)
Perhaps some conversion from family() -> distributional would be useful then, although you would still need the predicted values (or generally the parameters) to parameterise the distribution.
I'm not quite sure how reparameterising each density conflicts with an as_distribution() function - could you elaborate on how your use case works with if checks but not with as_distributional?
Hmm I guess an as_distributional function with a family method could still be useful - just in conjunction with some "custom" code. By reparametrisation I mean that if I want to write some code for my log-likelihood calculation that should work for both fx. a guassian density and a binomial density, I need the function that fetches the density based on the family to "have the same first argument". Meaning that when I do my linear prediction eta and transform it to mu using the inverse link function, I want some pseudocode to illustrate the idea to look something like so:
log_likelihood <- function(formula, data, family, extra_args = list(prior_sigma) {
# Doing stuff to prepare data and calculate my `mu`
dist <- as_distributional(family, mu, extra_args)
return(sum(density(dist, Y, log = T)))
}
And for this to work, the binomial density needs to take as first argument the prop parameter, so prop will be equal to mu in the distribution, the size parameter should be set to 1 (it's not possible to input a vector of values mu to the bernoulli density), and it should be able to take in something like extra_args to allow for additional arguments for other families (like an sd for the gaussian case). So even with as_distributional I would have to do something like so:
density_parametrised_by_mu_glm <- function(family) {
switch(family$family,
gaussian = function(main_parameter, ...)
as_distributional(family, mean = main_parameter, ...),
binomial = function(main_parameter, ...)
as_distributional(family, prob = main_parameter, size = 1, ...)
)
}
So that's my "issue" with having to "reparametrise the densities" even if there was an as_distributional function - though it could still help. I'm still a novice R programmer and package developer, so I might be missing a much more elegant solution to my problem 😅
You might like to use distributional::log_likelihood(), which essentially does the same as your implementation (sum(density(dist, Y, log = T))).
As for handling size = 1 for the binomial family, I feel like this is better handled in your package. Very similar to your second function here, I would suggest writing a function that gives the appropriate distributional object (that can then be used with log_likelihood() and other computation methods).
library(distributional)
pred_dist <- function(family, eta) {
# Compute the predicted mean using the inverse link function
mu <- family$linkinv(eta)
# Switch for density calculation based on family
switch(family$family,
"gaussian" = dist_normal(mean = mu, sd = ???),
"binomial" = dist_binomial(size = 1, prob = mu),
"poisson" = dist_poisson(lambda = mu),
"inverse.gaussian" = dist_inverse_gaussian(mu, ???)
stop(paste("Density calculation not implemented for family:", family$family))
)
}
I leave it to you to handle the variances / dispersion / etc.
Ah okay, I didn't notice the existence of the log_likelihood function, I'll use that together with something like the above.
Thanks for engaging with my problem!
I'll leave it up to you whether to close the issue or keep it open in case it could be interesting to look into an implementation of as_distributional in the future.