pycop icon indicating copy to clipboard operation
pycop copied to clipboard

Wrong get_pdf formula for Gaussian copula

Open shudabee opened this issue 11 months ago • 1 comments

In the pdf computation for Gaussian copula, the formula in the last row is wrong. def get_pdf(self, u, v, param): """ # Computes the PDF

    Parameters
    ----------
    u, v : float
        Values of the marginal CDFs 
    param : list
        The correlation coefficient param[0] ∈ [-1,1].
        Used to defined the correlation matrix (squared, symetric and definite positive)
    """
    
    rho = param[0]
    a = np.sqrt(2) * erfinv(2 * u - 1)
    b = np.sqrt(2) * erfinv(2 * v - 1)
    det_rho = 1 - rho**2
    
    return det_rho**-0.5 * np.exp(-((a**2 + b**2) * rho**2 -2 * a * b * rho) / (2 * det_rho))

it should be 1/(2 * np.pi * np.sqrt(det_rho)) * np.exp(-((a2 + b2) * rho**2 -2 * a * b * rho) / (2 * det_rho))

shudabee avatar Jul 25 '23 18:07 shudabee

Hi @shudabee,

The function was corrected by @ioannisrpt, since using scipy.stats.multivariate_normal was triggering issues during the estimation: pycop/issues/4.

Here is a proof that @ioannisrpt's code is correct:

Given a correlation coefficient $\rho \in (-1,1)$, the Gaussian copula is expressed as:

$$ C_{\boldsymbol{\rho}}^{\text{Gauss}}\left(u,v\right) = \Phi_{\boldsymbol{\rho}}\left(\Phi^{-1}\left(u\right), \Phi^{-1}\left(v\right)\right), $$

where $\Phi$ denotes the cumulative distribution function (CDF) of the standard normal distribution, and $\Phi^{-1}$ its inverse. The joint density function of two random variables $X$ and $Y$ can be decomposed as:

$$ f_{XY}(x, y) = c(u, v) f_X(x) f_Y(y), $$

where $f_X(x)$ and $f_Y(y)$ are the marginal density functions of $X$ and $Y$, respectively, and $c(u, v)$ is the copula density. For the Gaussian copula, the density $c_{\boldsymbol{\rho}}^{\text{Gauss}}$ is given by

$$ c_{\boldsymbol{\rho}}^{\text{Gauss}}\left(u,v\right) = \frac{\phi_{\rho}(x, y)}{\phi(x) \phi(y)}, $$

where $\phi_{\rho}(x, y)$ is the joint density of the standard bivariate normal distribution with correlation coefficient $\rho$, and $\phi(x)$ and $\phi(y)$ represent the probability density function (PDF) of the standard normal distribution.

Given that $\phi_{\rho}(x, y)$ is defined as:

$$ \phi_{\rho}(x, y) = \frac{1}{2 \pi \sqrt{1-\rho^2}} \exp \left(-\frac{1}{2(1-\rho^2)}\left[x^2 - 2\rho xy + y^2\right]\right), $$

and $\phi(x)$ as:

$$ \phi(x) = \frac{1}{\sqrt{2 \pi}} \exp \left(-\frac{x^2}{2}\right). $$

Thus, the Gaussian copula density can be expressed as follows:

$$ \begin{align} c_{\boldsymbol{\rho}}^{\text{Gauss}}\left(u,v\right) &= \frac{\frac{1}{2 \pi \sqrt{1-\rho^2}} \exp \left\{-\frac{1}{2\left(1-\rho^2\right)}\left[x^2-2 \rho x y+y^2\right]\right\}}{\frac{1}{\sqrt{2 \pi}} \exp \left(-\frac{1}{2} x^2\right) \times \frac{1}{\sqrt{2 \pi}} \exp \left(-\frac{1}{2} y^2\right)} \\ &= \frac{1}{\sqrt{1-\rho^2}}\exp \left\{-\frac{x^2-2 \rho x y+y^2}{2\left(1-\rho^2\right)} +\frac{1}{2} x^2+\frac{1}{2} y^2\right\} \\ &= \frac{1}{\sqrt{1-\rho^2}}\exp \left\{-\frac{(x^2+y^2)\rho^2 - 2\rho x y}{2\left(1-\rho^2\right)}\right\} \end{align} $$

where $x = \Phi^{-1}(u)$ and $y = \Phi^{-1}(v)$. So the corresponding python code:

import numpy as np
from scipy.special import erfinv
a = np.sqrt(2) * erfinv(2 * u - 1)
b = np.sqrt(2) * erfinv(2 * v - 1)
det_rho = 1 - rho**2
return det_rho**-0.5 * np.exp(-((a**2 + b**2) * rho**2 -2 * a * b * rho) / (2 * det_rho))

is correct.

If you are not convince, I believe following this approach will give the same results:

$$ c_{\boldsymbol{\rho}}^{Gauss}\left(u,v\right)=\frac{\phi_{\rho}(x, y)}{\phi(x) \phi(y)}= \frac{\phi_{\boldsymbol{\rho}}\left(\Phi^{-1}\left(u\right), \Phi^{-1}\left(v\right)\right)}{ uv}, $$

with the corresponding python code:

from scipy.stats import norm, multivariate_normal

def get_pdf(u, v, param):
    rho = param[0]
    y1 = norm.ppf(u, 0, 1)
    y2 = norm.ppf(v, 0, 1)
    multivariate_normal.pdf((y1,y2), mean=None, cov=[[1,rho],[rho,1]])/(u*v)

maximenc avatar Mar 13 '24 18:03 maximenc