pymc-marketing
pymc-marketing copied to clipboard
Cannot save/load model
Using version 0.11.0. Trying to save the model using pickle, but it gives the following error:
PicklingError: Can't pickle <function create_dim_handler.<locals>.func at 0x00000202218598A0>: it's not found as pymc_marketing.prior.create_dim_handler.<locals>.func
I've tried joblib and pickle. Both result in this error.
Is there an issue with the save and load methods? That is the intended io
You will have to use cloudpickle due to local functions as an alternative
I think we can close the issue. I tried with the default save and load methods, and they worked fine.
Sounds good @kb-open
If you run into any issues, feel free to open another issue.
The load_from_idata classmethod can also be used for additional IO flexibility FYI
The load method does not work for causal MMM. For example, I get the following error when trying to load the causal_mm model object as per the official documentation here: https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_causal_identification.html.
DifferentModelError Traceback (most recent call last)
File ~\anaconda3\envs\mmm\Lib\site-packages\pymc_marketing\model_builder.py:617, in ModelBuilder.load(cls, fname)
616 try:
--> 617 return cls.load_from_idata(idata)
618 except DifferentModelError as e:
File ~\anaconda3\envs\mmm\Lib\site-packages\pymc_marketing\model_builder.py:572, in ModelBuilder.load_from_idata(cls, idata)
566 msg = (
567 "The model id in the InferenceData does not match the model id. "
568 "There was no error loading the inference data, but the model may "
569 "be different. "
570 "Investigate if the model structure or configuration has changed."
571 )
--> 572 raise DifferentModelError(msg)
574 return model
DifferentModelError: The model id in the InferenceData does not match the model id. There was no error loading the inference data, but the model may be different. Investigate if the model structure or configuration has changed.
The above exception was the direct cause of the following exception:
DifferentModelError Traceback (most recent call last)
Cell In[55], line 1
----> 1 model_bayesian = MMM.load('model/model_bayesian.nc')
File ~\anaconda3\envs\mmm\Lib\site-packages\pymc_marketing\model_builder.py:624, in ModelBuilder.load(cls, fname)
618 except DifferentModelError as e:
619 error_msg = (
620 f"The file '{fname}' does not contain "
621 "an InferenceData of the same model "
622 f"or configuration as '{cls._model_type}'"
623 )
--> 624 raise DifferentModelError(error_msg) from e
DifferentModelError: The file 'model/model_bayesian.nc' does not contain an InferenceData of the same model or configuration as 'MMM'
My code:
model_bayesian.save('model/model_bayesian.nc')
model_bayesian = MMM.load('model/model_bayesian.nc')
And model_bayesian is same as causal_mm, just the name is different.
So you are running the notebook? Or do you have a different configuration than the notebook?
Same configuration but I tried to save and load.
Same configuration but I tried to save and load.
Which version(s)
0.11.0
Just to give a little more info, in case it helps solving the problem.
As soon as I add dag and outcome variables to the model configurations, the issue occurs. That is, the issue occurs only with causal model. As soon as I remove these variables (while keeping everything else exactly the same), the issue disappears.
One thing I notice is that, with causal model, saturation_beta variable doesn't exist anymore. And saturation_alpha appears instead. I'm talking about the changes in default configs. Maybe this is the clue to debugging the issue @wd60622
Thanks for the context. What are the values you are passing to dag?
causal_dag = """digraph {x1 -> y; x2 -> y; x1 -> x2; holiday_signal -> y; holiday_signal -> x1; holiday_signal -> x2; competitor_offers -> x2; competitor_offers -> y; market_growth -> y;}"""
Can you load the nc file directly with arviz and share what are the attrs of InferenceData and the values of fit_data Dataset group
Code used:
idata = az.from_netcdf('model/model_bayesian.nc')
print("Attributes of InferenceData:")
print(idata.attrs)
if "fit_data" in idata.groups():
print("\nValues of 'fit_data' Dataset group:")
print(idata.fit_data)
else:
print("\n'fit_data' group not found in the InferenceData object.")
Output:
Attributes of InferenceData:
{'id': 'bcbce0522a5869f2', 'model_type': 'MMM', 'version': '0.0.2', 'sampler_config': '{}', 'model_config': '{"intercept": {"dist": "HalfNormal", "kwargs": {"sigma": 0.5}}, "likelihood": {"dist": "Normal", "kwargs": {"sigma": {"dist": "HalfNormal", "kwargs": {"sigma": 2}}}, "dims": ["date"]}, "gamma_control": {"dist": "Normal", "kwargs": {"mu": 0, "sigma": 1}, "dims": ["control"]}, "gamma_fourier": {"dist": "Laplace", "kwargs": {"mu": 0, "b": 1}, "dims": ["fourier_mode"]}, "intercept_tvp_config": {"m": 200, "L": 729.25, "eta_lam": 1.0, "ls_mu": 100.0, "ls_sigma": 10.0, "cov_func": null}, "media_tvp_config": {"m": 200, "L": 729.25, "eta_lam": 1.0, "ls_mu": 5.0, "ls_sigma": 10.0, "cov_func": null}, "adstock_alpha": {"dist": "Beta", "kwargs": {"alpha": 1, "beta": 3}, "dims": ["channel"]}, "saturation_alpha": {"dist": "Gamma", "kwargs": {"mu": 2, "sigma": 1}, "dims": ["channel"]}, "saturation_lam": {"dist": "HalfNormal", "kwargs": {"sigma": 1}, "dims": ["channel"]}}', 'date_column': '"date_str"', 'adstock': '{"lookup_name": "geometric", "prefix": "adstock", "priors": {"alpha": {"dist": "Beta", "kwargs": {"alpha": 1, "beta": 3}, "dims": ["channel"]}}, "l_max": 12, "normalize": true, "mode": "After"}', 'saturation': '{"lookup_name": "michaelis_menten", "prefix": "saturation", "priors": {"alpha": {"dist": "Gamma", "kwargs": {"mu": 2, "sigma": 1}, "dims": ["channel"]}, "lam": {"dist": "HalfNormal", "kwargs": {"sigma": 1}, "dims": ["channel"]}}}', 'adstock_first': 'true', 'control_columns': '["holiday_signal"]', 'channel_columns': '["x1", "x2"]', 'validate_data': 'true', 'yearly_seasonality': 'null', 'time_varying_intercept': 'true', 'time_varying_media': 'true', 'dag': '"digraph {x1 -> y;\\n x2 -> y;\\n x1 -> x2;\\n holiday_signal -> y;\\n holiday_signal -> x1;\\n holiday_signal -> x2;\\n competitor_offers -> x2;\\n competitor_offers -> y;\\n market_growth -> y;}"', 'treatment_nodes': '["x1", "x2"]', 'outcome_node': '"y"'}
Values of 'fit_data' Dataset group:
<xarray.Dataset> Size: 76kB
Dimensions: (date: 729)
Coordinates:
* date (date) datetime64[ns] 6kB 2022-01-01 ... 2023-12-30
Data variables:
holiday_signal (date) float64 6kB ...
competitor_offers (date) float64 6kB ...
x1 (date) float64 6kB ...
x2 (date) float64 6kB ...
market_growth (date) float64 6kB ...
t (date) float64 6kB ...
date_str (date) <U10 29kB ...
y (date) float64 6kB ...
Tagging @wd60622 just in case my comment above got missed, since there has been no update.
I am unable to reproduce. Can you make a small reproducible example
example.zip Please find attached @wd60622
same issue here! I remove "# if model.id != idata.attrs["id"]: # raise ValueError( # f"The file '{fname}' does not contain an inference data of the same model or configuration as '{cls._model_type}'" # )"
and implement a load method exactly the same as the implementation in ModelBuilder under my customized class (class MMMModel(ModelBuilder)). It works. I was wondering what this clause is doing.
Hi @bravoila, can you inspect the mmm.idata.attrs of the two models and share the differences?
Hi @bravoila, can you inspect the mmm.idata.attrs of the two models and share the differences?
I'm sorry for not getting back to you sooner. The two models are indeed different because I use cross-validation when training the model and then select the best one, so the ids are different.
same issue here! I remove "# if model.id != idata.attrs["id"]: # raise ValueError( # f"The file '{fname}' does not contain an inference data of the same model or configuration as '{cls._model_type}'" # )"
and implement a load method exactly the same as the implementation in ModelBuilder under my customized class (class MMMModel(ModelBuilder)). It works. I was wondering what this clause is doing.
@williambdean You should be able to find a clue from this comment.
example.zip Please find attached @wd60622
@williambdean were you able to reproduce?
Please provide a minimal example that is not a zip file
Hi, I am also facing a similar issue with Save / Load MMM and attached an example below. Please note that I overwrote some file names for clarity.
Using these methods, I get the following error:
mmm_trained.save("iter11.nc")
mmm_loaded = MMM.load("iter11.nc")
DifferentModelError: The file 'iter11.nc' does not contain an InferenceData of the
same model or configuration as 'MMM'
However, when I manually load the nc file with arviz, the InferenceData attrs are:
saved_data = az.from_netcdf('iter11.nc')
saved_data.attrs
{'id': 'a8aaafd8d5e4b752',
'model_type': 'MMM',
'version': '0.0.2',
...}
Similarly, the attrs of mmm_trained:
mmm_trained.idata.attrs
{'id': 'a8aaafd8d5e4b752',
'model_type': 'MMM',
'version': '0.0.2',
...}
I don't understand why it is throwing an error. I am on version pymc-marketing==0.13.1
I have a related issue, where I can save/load models on the same environment (locally or in Vertex AI) without problems, but I cannot load models produced in Vertex AI on my local device even though when inspecting the .nc file the id's seem to match.
For context, the model specification is that of the introduction notebook. Below is a screenshot of the model I produced in Vertex AI, which I'm able to load no problem:
The screenshot below is the same .nc file opened using xarray's open_dataset function:
As you can see, opening the .nc file locally gives an error:
EDIT: the from_netcdf function seems to work fine! Of course this is not the same as importing the full model, but an indication that the first step of the code works properly.