pymc-marketing icon indicating copy to clipboard operation
pymc-marketing copied to clipboard

Cannot save/load model

Open kb-open opened this issue 10 months ago • 23 comments

Using version 0.11.0. Trying to save the model using pickle, but it gives the following error: PicklingError: Can't pickle <function create_dim_handler.<locals>.func at 0x00000202218598A0>: it's not found as pymc_marketing.prior.create_dim_handler.<locals>.func

I've tried joblib and pickle. Both result in this error.

kb-open avatar Feb 03 '25 06:02 kb-open

Is there an issue with the save and load methods? That is the intended io

You will have to use cloudpickle due to local functions as an alternative

williambdean avatar Feb 03 '25 07:02 williambdean

I think we can close the issue. I tried with the default save and load methods, and they worked fine.

kb-open avatar Feb 03 '25 13:02 kb-open

Sounds good @kb-open

If you run into any issues, feel free to open another issue. The load_from_idata classmethod can also be used for additional IO flexibility FYI

williambdean avatar Feb 03 '25 15:02 williambdean

The load method does not work for causal MMM. For example, I get the following error when trying to load the causal_mm model object as per the official documentation here: https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_causal_identification.html.

DifferentModelError                       Traceback (most recent call last)
File ~\anaconda3\envs\mmm\Lib\site-packages\pymc_marketing\model_builder.py:617, in ModelBuilder.load(cls, fname)
    616 try:
--> 617     return cls.load_from_idata(idata)
    618 except DifferentModelError as e:

File ~\anaconda3\envs\mmm\Lib\site-packages\pymc_marketing\model_builder.py:572, in ModelBuilder.load_from_idata(cls, idata)
    566     msg = (
    567         "The model id in the InferenceData does not match the model id. "
    568         "There was no error loading the inference data, but the model may "
    569         "be different. "
    570         "Investigate if the model structure or configuration has changed."
    571     )
--> 572     raise DifferentModelError(msg)
    574 return model

DifferentModelError: The model id in the InferenceData does not match the model id. There was no error loading the inference data, but the model may be different. Investigate if the model structure or configuration has changed.

The above exception was the direct cause of the following exception:

DifferentModelError                       Traceback (most recent call last)
Cell In[55], line 1
----> 1 model_bayesian = MMM.load('model/model_bayesian.nc')

File ~\anaconda3\envs\mmm\Lib\site-packages\pymc_marketing\model_builder.py:624, in ModelBuilder.load(cls, fname)
    618 except DifferentModelError as e:
    619     error_msg = (
    620         f"The file '{fname}' does not contain "
    621         "an InferenceData of the same model "
    622         f"or configuration as '{cls._model_type}'"
    623     )
--> 624     raise DifferentModelError(error_msg) from e

DifferentModelError: The file 'model/model_bayesian.nc' does not contain an InferenceData of the same model or configuration as 'MMM'

kb-open avatar Feb 08 '25 11:02 kb-open

My code:

model_bayesian.save('model/model_bayesian.nc')
model_bayesian = MMM.load('model/model_bayesian.nc')

And model_bayesian is same as causal_mm, just the name is different.

kb-open avatar Feb 08 '25 11:02 kb-open

So you are running the notebook? Or do you have a different configuration than the notebook?

williambdean avatar Feb 08 '25 12:02 williambdean

Same configuration but I tried to save and load.

kb-open avatar Feb 08 '25 13:02 kb-open

Same configuration but I tried to save and load.

Which version(s)

williambdean avatar Feb 08 '25 13:02 williambdean

0.11.0

kb-open avatar Feb 08 '25 14:02 kb-open

Just to give a little more info, in case it helps solving the problem.

As soon as I add dag and outcome variables to the model configurations, the issue occurs. That is, the issue occurs only with causal model. As soon as I remove these variables (while keeping everything else exactly the same), the issue disappears.

One thing I notice is that, with causal model, saturation_beta variable doesn't exist anymore. And saturation_alpha appears instead. I'm talking about the changes in default configs. Maybe this is the clue to debugging the issue @wd60622

kb-open avatar Feb 08 '25 16:02 kb-open

Thanks for the context. What are the values you are passing to dag?

williambdean avatar Feb 09 '25 07:02 williambdean

causal_dag = """digraph {x1 -> y; x2 -> y; x1 -> x2; holiday_signal -> y; holiday_signal -> x1; holiday_signal -> x2; competitor_offers -> x2; competitor_offers -> y; market_growth -> y;}"""

kb-open avatar Feb 09 '25 08:02 kb-open

Can you load the nc file directly with arviz and share what are the attrs of InferenceData and the values of fit_data Dataset group

williambdean avatar Feb 12 '25 02:02 williambdean

Code used:

idata = az.from_netcdf('model/model_bayesian.nc')
print("Attributes of InferenceData:")
print(idata.attrs)

if "fit_data" in idata.groups():
    print("\nValues of 'fit_data' Dataset group:")
    print(idata.fit_data)
else:
    print("\n'fit_data' group not found in the InferenceData object.")

Output:

Attributes of InferenceData:
{'id': 'bcbce0522a5869f2', 'model_type': 'MMM', 'version': '0.0.2', 'sampler_config': '{}', 'model_config': '{"intercept": {"dist": "HalfNormal", "kwargs": {"sigma": 0.5}}, "likelihood": {"dist": "Normal", "kwargs": {"sigma": {"dist": "HalfNormal", "kwargs": {"sigma": 2}}}, "dims": ["date"]}, "gamma_control": {"dist": "Normal", "kwargs": {"mu": 0, "sigma": 1}, "dims": ["control"]}, "gamma_fourier": {"dist": "Laplace", "kwargs": {"mu": 0, "b": 1}, "dims": ["fourier_mode"]}, "intercept_tvp_config": {"m": 200, "L": 729.25, "eta_lam": 1.0, "ls_mu": 100.0, "ls_sigma": 10.0, "cov_func": null}, "media_tvp_config": {"m": 200, "L": 729.25, "eta_lam": 1.0, "ls_mu": 5.0, "ls_sigma": 10.0, "cov_func": null}, "adstock_alpha": {"dist": "Beta", "kwargs": {"alpha": 1, "beta": 3}, "dims": ["channel"]}, "saturation_alpha": {"dist": "Gamma", "kwargs": {"mu": 2, "sigma": 1}, "dims": ["channel"]}, "saturation_lam": {"dist": "HalfNormal", "kwargs": {"sigma": 1}, "dims": ["channel"]}}', 'date_column': '"date_str"', 'adstock': '{"lookup_name": "geometric", "prefix": "adstock", "priors": {"alpha": {"dist": "Beta", "kwargs": {"alpha": 1, "beta": 3}, "dims": ["channel"]}}, "l_max": 12, "normalize": true, "mode": "After"}', 'saturation': '{"lookup_name": "michaelis_menten", "prefix": "saturation", "priors": {"alpha": {"dist": "Gamma", "kwargs": {"mu": 2, "sigma": 1}, "dims": ["channel"]}, "lam": {"dist": "HalfNormal", "kwargs": {"sigma": 1}, "dims": ["channel"]}}}', 'adstock_first': 'true', 'control_columns': '["holiday_signal"]', 'channel_columns': '["x1", "x2"]', 'validate_data': 'true', 'yearly_seasonality': 'null', 'time_varying_intercept': 'true', 'time_varying_media': 'true', 'dag': '"digraph {x1 -> y;\\n                         x2 -> y;\\n                         x1 -> x2;\\n                         holiday_signal -> y;\\n                         holiday_signal -> x1;\\n                         holiday_signal -> x2;\\n                         competitor_offers -> x2;\\n                         competitor_offers -> y;\\n                         market_growth -> y;}"', 'treatment_nodes': '["x1", "x2"]', 'outcome_node': '"y"'}

Values of 'fit_data' Dataset group:
<xarray.Dataset> Size: 76kB
Dimensions:            (date: 729)
Coordinates:
  * date               (date) datetime64[ns] 6kB 2022-01-01 ... 2023-12-30
Data variables:
    holiday_signal     (date) float64 6kB ...
    competitor_offers  (date) float64 6kB ...
    x1                 (date) float64 6kB ...
    x2                 (date) float64 6kB ...
    market_growth      (date) float64 6kB ...
    t                  (date) float64 6kB ...
    date_str           (date) <U10 29kB ...
    y                  (date) float64 6kB ...

kb-open avatar Feb 12 '25 13:02 kb-open

Tagging @wd60622 just in case my comment above got missed, since there has been no update.

kb-open avatar Feb 16 '25 13:02 kb-open

I am unable to reproduce. Can you make a small reproducible example

williambdean avatar Feb 16 '25 19:02 williambdean

example.zip Please find attached @wd60622

kb-open avatar Feb 17 '25 13:02 kb-open

same issue here! I remove "# if model.id != idata.attrs["id"]: # raise ValueError( # f"The file '{fname}' does not contain an inference data of the same model or configuration as '{cls._model_type}'" # )"

and implement a load method exactly the same as the implementation in ModelBuilder under my customized class (class MMMModel(ModelBuilder)). It works. I was wondering what this clause is doing.

bravoila avatar Mar 07 '25 16:03 bravoila

Hi @bravoila, can you inspect the mmm.idata.attrs of the two models and share the differences?

williambdean avatar Mar 14 '25 01:03 williambdean

Hi @bravoila, can you inspect the mmm.idata.attrs of the two models and share the differences?

I'm sorry for not getting back to you sooner. The two models are indeed different because I use cross-validation when training the model and then select the best one, so the ids are different.

bravoila avatar Mar 25 '25 02:03 bravoila

same issue here! I remove "# if model.id != idata.attrs["id"]: # raise ValueError( # f"The file '{fname}' does not contain an inference data of the same model or configuration as '{cls._model_type}'" # )"

and implement a load method exactly the same as the implementation in ModelBuilder under my customized class (class MMMModel(ModelBuilder)). It works. I was wondering what this clause is doing.

@williambdean You should be able to find a clue from this comment.

kb-open avatar Mar 25 '25 15:03 kb-open

example.zip Please find attached @wd60622

@williambdean were you able to reproduce?

kb-open avatar Mar 25 '25 15:03 kb-open

example.zip Please find attached @wd60622

@williambdean were you able to reproduce?

Please provide a minimal example that is not a zip file

williambdean avatar Mar 25 '25 16:03 williambdean

Hi, I am also facing a similar issue with Save / Load MMM and attached an example below. Please note that I overwrote some file names for clarity.

Using these methods, I get the following error:

mmm_trained.save("iter11.nc")
mmm_loaded = MMM.load("iter11.nc")

DifferentModelError: The file 'iter11.nc' does not contain an InferenceData of the 
same model or configuration as 'MMM'

However, when I manually load the nc file with arviz, the InferenceData attrs are:

saved_data = az.from_netcdf('iter11.nc')
saved_data.attrs

{'id': 'a8aaafd8d5e4b752',
 'model_type': 'MMM',
 'version': '0.0.2',
...}

Similarly, the attrs of mmm_trained:

mmm_trained.idata.attrs

{'id': 'a8aaafd8d5e4b752',
 'model_type': 'MMM',
 'version': '0.0.2',
...}

I don't understand why it is throwing an error. I am on version pymc-marketing==0.13.1

wn385 avatar May 07 '25 19:05 wn385

I have a related issue, where I can save/load models on the same environment (locally or in Vertex AI) without problems, but I cannot load models produced in Vertex AI on my local device even though when inspecting the .nc file the id's seem to match.

For context, the model specification is that of the introduction notebook. Below is a screenshot of the model I produced in Vertex AI, which I'm able to load no problem: Image

The screenshot below is the same .nc file opened using xarray's open_dataset function: Image

As you can see, opening the .nc file locally gives an error: Image

EDIT: the from_netcdf function seems to work fine! Of course this is not the same as importing the full model, but an indication that the first step of the code works properly.

bart-vanvlerken avatar Jul 10 '25 19:07 bart-vanvlerken