ngboost icon indicating copy to clipboard operation
ngboost copied to clipboard

Discrete Burr XII and Skew-T distributions

Open alejandroschuler opened this issue 4 years ago • 6 comments

Moving discussion here from https://github.com/stanfordmlgroup/ngboost/issues/60#issuecomment-577779248

@cooberp @kmedved here is some scaffold code that I'm hoping you folks can fill in/modify to get your distributions up and running:

from ngboost.distns import Distn
import numpy as np

class Generalized3ParamBurrXIIDiscrete(Distn):

    n_params = 3 

    def __init__(self, params):
        self.params_ = params # all real numbers 
        self.mu, self.sigma, self.mu = [exp(p) for p in params] # transform to positives

    def sample(self, m):
        """
        Code to draw m random samples from the distribution. Each sample will have 
        n = len(self) = len(self.mu) elements, so the output should be m x n        
        """
        return None

    def fit(Y):
        """
        Code to fit a *single* Discrete Burr XII distribution to data Y. 
        Needs to return parameters mu, sigma, and nu in a numpy array.

        This is just for initialization of the NGBoost algorithm, so it doesn't need to be perfect. 
        Ballpark is good enough.
        """
        mu, sigma, nu = None, None, None
        return np.array([mu, sigma, nu])

    # log score methods
    def nll(self, Y): 
        """
        log-likelihood (per observation). 
        Returns a vector of length n = len(self)
        """
        return -np.log(((1 + (Y/self.mu)**self.sigma)**(-self.nu)) - ((1 + ((Y+1)/self.mu)**self.sigma)**(-self.nu)))

    def D_nll(self, Y):
        """
        Returns the derivative of self.nll() with respect to each of the real-valued 
        parameters [log(mu), log(sigma), log(nu)]. 

        These can be easily calculated using, e.g., wolframalpha, and efficiently implemented here.
        """
        d_log_mu = np.zeros_like(self.mu)
        d_log_sigma = np.zeros_like(self.sigma)
        d_log_nu = np.zeros_like(self.nu)
        return np.array([dmu, dsigma, dnu])

This is for Discrete Burr XII, but the equivalent code should also do to implement your skew-t distribution.

D_nll() should be easy. Just copy-paste the nll into wolframalpha, edit the variable names, and ask for derivatives. Copy-paste back to D_nll, re-edit the variable names, and call it a day.

The biggest challenge, I think, will be implementing the sample() method, which is necessary if you don't want to derive/implement the Fisher Information. I was working on this myself but didn't have luck with anything simple. As you know, the distribution isn't already implemented in scipy.stats or another python package. scipy.stats does have a Burr XII, which I hoped to sample from and then use np.floor() on to get the discrete version, but then I noticed that what they call Burr XII has a different number of parameters than the pmf you gave me, which I think corresponds to some 3-parameter "generalized" version of the (discretized) Burr XII... All of which is fine, if that's what you want, but the upshot is that there isn't a pre-implemented version to sample from or an easy way to make one.

On the other hand, I don't think this is at all an insurmountable challenge. Making a sampling algorithm for a "custom" distribution is fairly straightforward using, e.g., inverse transform sampling. All you need to do is calculate the inverse CDF (use wolframalpha or whatever) and implement that. And if that doesn't work, there are other methods. All in all, still probably easier than deriving the Fisher.

The fit() method is also not necessarily trivial, but feel free to use whatever heuristics you want since it's just for initialization. Or go wild and implement/call some optimization method of your choosing.

I haven't looked into skew-T that closely, but the same general ideas should apply. And if the proper sample() method is already implemented somewhere, your job will be easier.

Since this is all a little bit of a challenge and not distributions others will likely use, I'm hoping you two (and/or your collaborators) can give me a hand here and give this a shot. But please do let me know where/if you get stuck and I will jump in to rescue as necessary!

alejandroschuler avatar Jan 28 '20 06:01 alejandroschuler

Thank you very much, Alejandro. I'm a bit swamped at the moment—I'm getting married next month!—but will eagerly take a crack at this as soon as life returns to normal. I really, really appreciate your help and effort at incorporating these idiosyncratic distributions into your package.

cooberp avatar Jan 28 '20 10:01 cooberp

Congratulations! Best of luck with all the prep.

I'll look forward to hearing your progress in a little while- don't hesitate to reach out.

alejandroschuler avatar Jan 28 '20 16:01 alejandroschuler

just a heads up that the developer interface for implementing new distributions has been streamlined a bit- you can see an example in our new developer guide.

alejandroschuler avatar Feb 04 '20 18:02 alejandroschuler

Updated link to the dev guide https://stanfordmlgroup.github.io/ngboost/5-dev.html

ryan-wolbeck avatar Aug 20 '20 16:08 ryan-wolbeck

Thanks @ryan-wolbeck. This remains something I'm super interested in (any skewed regression distributions would be a huge step), but having taken a shot, the math is over my head for now for now. I'll keep trying to prod @anpatton or @cooberp into it though. :-).

kmedved avatar Aug 20 '20 18:08 kmedved

Are there by any chance any updates on implementing these distributions. Is it not possible for the Skewed student-t to use the nct distribution of scipy.stats?

Hid83 avatar Nov 10 '23 15:11 Hid83