parameter estimators & avg log likihood
would be nice to have parameter estimators for the distributions.
particularly i was looking for an MLE for the beta distribution.
could make an avg log likilihood that just returns the log likilihood of an array of samples, given the distribution, divided by the number of samples.
then use this to increase likilihood ala something like newton's method.
then you have a generic parameter estimator.
then later can start adding better estimators for each distribution (especially closed form ones).
(mark as enhancement, please)
Would it suffice to add something like this to each distribution? Here are MLE and log-likelihood calculations for a normal distribution. Similar methods could be added to others.
/**
* Use the data in $data to estimate the parameters of the distribution
*
* $data is an array of data.
*/
public static function MLE(array $data): array
{
$μ = Average::mean($data);
$σ = Average::stdev($data);
return ['μ' => $μ, 'σ'=>$σ];
}
/**
* Using estimates for the mean and standard deviation, calculate how well the data in $data fits
* the distribution
*
* $μ is the sample mean to test
* $σ is the sample variance to test
* $data is an array of data.
*/
public static function logLikelihood($μ, $σ, array $data)
{
$n = count($data);
$tau = 2 * \PI;
$μ = Average::mean($data);
$σ = Average::stdev($data);
$sum_dev_squared = array_sum(Single::square(Single::subtract($data, $μ)));
return -$n/2 * (log($tau) + log($σ ** 2)) - 1 / 2 / $σ / $σ * $sum_dev_squared;
}
I was thinking for the loglikelihood function, to e.g. just call the class's existing pdf function whever possible, to keep it as generic as possible.
And i was thinking making a generic Newton-Rhapson iterative estimator for the MLE, to use for ones with no closed-from solution. And then of course for the ones with closed form solutions, you'd override that with the closed form solution.
(i'm assuming instantiable classes here. which is not the case right now. so translate as needed.)
On Wed, Apr 19, 2017 at 12:48 PM, Kevin Nowaczyk [email protected] wrote:
Would it suffice to add something like this to each distribution? Here are MLE and log-likelihood calculations for a normal distribution. Similar methods could be added to others.
/** * Use the data in $data to estimate the parameters of the distribution * * $data is an array of data. */ public static function MLE(array $data): array { $μ = Average::mean($data); $σ = Average::stdev($data); return ['μ' => $μ, 'σ'=>$σ]; } /** * Using estimates for the mean and standard deviation, calculate how well the data in $data fits * the distribution * * $μ is the sample mean to test * $σ is the sample variance to test * $data is an array of data. */ public static function logLikelihood($μ, $σ, array $data) { $n = count($data); $tau = 2 * \PI; $μ = Average::mean($data); $σ = Average::stdev($data); $sum_dev_squared = array_sum(Single::square(Single::subtract($data, $μ))); return -$n/2 * (log($tau) + log($σ ** 2)) - 1 / 2 / $σ / $σ * $sum_dev_squared; }— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/markrogoyski/math-php/issues/208#issuecomment-295364064, or mute the thread https://github.com/notifications/unsubscribe-auth/AB7NQSk6tm2gQxkAGYBac_YPDOUtiohWks5rxkjmgaJpZM4M9ypT .
Each distribution has a parent class (Continuous) that the generic version could be placed in. If you have a generalized method for calculating the log-likelihood, feel free to add it to MathPHP\Probability\Distribution\Continuous. It will probably be similar to how we use Continuous::inverse() to find the inverse of any distribution. Then, for example, the DiracDelta distrubution overrides this to always return an inverse of zero.
Look at Continuous::inverse() to see how a parent class can access the methods from a child using the static::method() style calls and the splat operator (...$params).