pymc-marketing icon indicating copy to clipboard operation
pymc-marketing copied to clipboard

MMM Example Notebook: Time Varying Coefficients

Open juanitorduz opened this issue 2 years ago • 10 comments

To prove the flexibility and potential of framework (namely, bayesian stats and pymc), we would like to have an example notebook to illustrate how to extend the base model (based on simulated data) introduced in https://github.com/pymc-labs/pymmmc/pull/41 by allowing time varying coefficients via gaussian processes.

juanitorduz avatar Jun 22 '22 18:06 juanitorduz

Are there any concrete plans on if/when this is going to be implemented. I would be quite(!) interested in this.

sotte avatar Jan 19 '23 12:01 sotte

Are there any concrete plans on if/when this is going to be implemented. I would be quite(!) interested in this.

I think we definitely want to bring this up sometime (no ETA). In the meantime here is a simulated example using a gaussian random walk. Still, I think gaussian processes should work (and scale) a bit better, see for example this blog post by PyMC Labs

juanitorduz avatar Jan 19 '23 13:01 juanitorduz

Thanks for the quick response. I was actually hoping you would point to the hierarchical Gaussian process post :+1: I'll reread your post about the Gaussian random walk, as it seems simpler.

sotte avatar Jan 22 '23 15:01 sotte

I've ported uber/orbit like model in for time-varying coefficient in pymc. Here are MVP snippets

# From: https://github.com/uber/orbit/blob/dev/orbit/utils/kernels.py
def gaussian_kernel(t, t_refs, rho: float = 0.1, alpha: float = 1, point_to_flatten: float = 1):
    ker = np.where(
        (t <= point_to_flatten).reshape(-1, 1),
        (alpha**2) * np.exp(-1 * np.power(t.reshape(-1, 1) - t_refs.reshape(1, -1), 2)/(2 * rho**2 )),
        (alpha**2) * np.exp(-1 * np.power(np.array([point_to_flatten] * t.shape[0]).reshape(-1, 1) - t_refs.reshape(1, -1), 2)/(2 * rho**2 ))
    )
    return ker / np.sum(ker, axis=1, keepdims=True)

def triangle_kernel(t, t_refs):
    ker = np.zeros((t.shape[0], t_refs.shape[0]))
    ker[t < t_refs[0], 0] = 1
    for idx in range(t_refs.shape[0] - 1):
        valid_idxes = (t >= t_refs[idx]) & (t < t_refs[idx + 1])
        norm = t_refs[idx + 1] - t_refs[idx]
        ker[valid_idxes, idx] = np.abs(t[valid_idxes] - t_refs[idx + 1])/norm
        ker[valid_idxes, idx + 1] = np.abs(t[valid_idxes] - t_refs[idx])/norm
    ker[t >= t_refs[-1], -1] = 1
    return ker / np.sum(ker, axis=1, keepdims=True)

And

# Hyper parameters
n_knots = 10
n_order = 3

date = data_df.date.to_numpy()
t = np.array(data_df.date.index)
t = (t - t.min())/(t.max() - t.min())
y = data_df["y"].to_numpy().reshape(-1)
x = data_df[["x1", "x2", "x3"]].to_numpy()

t_refs = np.linspace(t.min(), t.max(), n_knots + 1)
periods = np.array(data_df.date.dt.day_of_year) / 365.25

trend_ker = triangle_kernel(t, t_refs)
reg_ker = gaussian_kernel(t, t_refs)
fourier_features = pd.DataFrame(
    {
        f"{func}_order_{order}": getattr(np, func)(2 * np.pi * periods * order)
        for order in range(1, n_order + 1)
        for func in ("sin", "cos")
    }
)

And

with pm.Model(coords=coords) as model:
    # Trend Priors
    b_trend_sigma = pm.HalfNormal("b_trend_sigma", sigma=1)
    b_trend = pt.cumsum(pm.Laplace("b_trend", mu=0, b=b_trend_sigma, shape=(n_knots + 1, )))
    
    # Fourier Priors
    b_fourier_sigma = pm.HalfNormal("b_fourier_sigma", sigma=1) 
    b_fourier = pt.cumsum(pm.Laplace("b_fourier", mu=0, b=b_fourier_sigma, shape=(n_knots + 1, 2 * n_order)), axis=0)
    
    # Regression Priors
    pool_mu = pm.Normal("pool_mu", mu=0, sigma=1)
    pool_sigma = pm.HalfNormal("pool_sigma", sigma=1) 
    
    reg_mu = pm.Normal("reg_mu", mu=pool_mu, sigma=pool_sigma, shape=(x.shape[1], ))
    reg_sigma = pm.HalfNormal("reg_sigma", sigma=0.25, shape=(x.shape[1], ))
    if is_only_positive:
        b_regs = pt.abs(
            pm.Normal(
                "b_regs",
                mu=pt.abs(reg_mu.reshape((x.shape[1], 1))),
                sigma=reg_sigma.reshape((x.shape[1], 1)),
                shape=(x.shape[1], len(t_refs), )
            ).T
        )
    else:
        b_regs = pm.Normal(
            "b_regs",
            mu=reg_mu.reshape((x.shape[1], 1)),
            sigma=reg_sigma.reshape((x.shape[1], 1)),
            shape=(x.shape[1], len(t_refs), )
        ).T
    
    # Calculation
    l_trend = pm.Deterministic("l_trend", var=(pm.math.dot(trend_ker, b_trend)), dims="date")
    s_fourier = pm.Deterministic("s_fourier", var=((fourier_features.to_numpy() * pm.math.dot(trend_ker, b_fourier))).sum(axis=1), dims="date")
    r_feat = pm.Deterministic("r_feat", var=((x * pm.math.dot(reg_ker, b_regs))).sum(axis=1), dims="date")
    
    nu = pm.Gamma(name="nu", alpha=10, beta=1)
    sigma = pm.HalfNormal(name="sigma", sigma=0.5)
    mu = pm.Deterministic("mu", var=(l_trend + s_fourier + r_feat), dims="date")
    
     # --- likelihood ---
    pm.StudentT(name="likelihood", nu=nu, mu=mu, sigma=sigma, observed=y, dims="date")

Using synthetic data from https://orbit-ml.readthedocs.io/en/latest/tutorials/ktr2.html

image

thipokKub avatar Apr 18 '23 16:04 thipokKub

Wow! Amazing! Do you wanna open a PR?

juanitorduz avatar Apr 18 '23 17:04 juanitorduz

I'm not sure where to put it exactly to be honest. Do I have to create model like DelayedSaturatedMMM, or it should be an auxiliary script entirely (not MMM model, but can be use as a template)?

thipokKub avatar Apr 18 '23 17:04 thipokKub

If you want you can create a PR with a little example. We do not merge one but we open one (together) where we start designing the API, for example. We will ask the PyMC-marketing team for feedback :)

juanitorduz avatar Apr 18 '23 17:04 juanitorduz

Okay, I've done it https://github.com/pymc-labs/pymc-marketing/pull/250

thipokKub avatar Apr 18 '23 18:04 thipokKub

Update, I had posted a question on pymc discourse about HGP implementation, and I've got it to work (as a concept) from bwengals's gist. So it should be able to add HGP as Time Varying Coefficients too?

thipokKub avatar May 08 '23 16:05 thipokKub

Oh wow! You are being super fast! 😄 (I'm not responding fast as I'm in parental leave). Feel free to open an other PR with the HS implementation! 🤜🤛

juanitorduz avatar May 08 '23 16:05 juanitorduz