skpro
skpro copied to clipboard
[ENH] design discussion - `pdf` and `pmf` in distributions, discrete, continuous, and mixed
This is a design discussion on how to handle pdf
and pmf
in distrubtions, which can be discrete, continuous (short for "absolutely continuous"), and mixed. Assuming domain on the real numbers, and distributions without singular component.
scipy
handles these as follows:
-
pmf
is present andpdf
is not present, for discrete distributions. -
pdf
is present andpmf
is not present, for continuous distributions. - no support for mixed distributions.
I think it would be more consistent with composition and unified interfaces a la sklearn
if all distributions had all these methods, and they correspond to the measures in the Lebesgue decomposition. That is,
-
pmf
andpdf
are present in all distributions - the sum of measures implied by
pmf
andpdf
is a probability measure
In particular, this would mean:
- for discrete distributions,
pmf
sums to one, andpdf
is always zero - for continuous distributions,
pdf
integrates to one, andpmf
is always zero - for mixed distributions, integral of
pdf
and sum ofpmf
sum to one. In general, thepdf
integral, orpmf
sum are not equal to one.
Being faithful to the Lebesgue decomposition also has an advantage in mixtures: the pdf
and pmf
of a m = Mixture([d1, d2], [w1, w2])
has m.pdf = w1 * d1.pdf + w2 * d2.pdf
, and m.pmf = w1 * d1.pmf + w2 * d2.pmf
, irrespective of components d1
, d2
being continuous, discrete, or mixed. (assuming w1 + w2 == 1
).
In a sense, this seems to be the convention that treats all edge cases consistently.
Thoughts?
Being faithful to the Lebesgue decomposition also has an advantage in mixtures: the
pmf
of am = Mixture([d1, d2], [w1, w2])
hasm.pdf = w1 * d1.pdf + w2 * d2.pdf
, andm.pmf = w1 * d1.pmf + w2 * d2.pmf
, irrespective of componentsd1
,d2
being continuous, discrete, or mixed. (assumingw1 + w2 == 1
). In a sense, this seems to be the convention that treats all edge cases consistently.
Yes, that is correct it will handle all edge cases irrespective of d1
, d2
being continuous, discrete or mixed as whenever the distribution becomes discrete the pdf
integrates to 0 in that interval only the pmf
will contribute in that interval.
And whenever the distribution becomes continuous in an interval the pmf
sum will be 0 and only the pdf
will contribute in that interval.
So in case of mixed distribution m.pdf = w1 * d1.pdf + w2 * d2.pdf
, and m.pmf = w1 * d1.pmf + w2 * d2.pmf
will still be true.
And m.pdf + m.pmf == 1
will also be true when we consider the whole interval ie (-inf, inf).