pymc
pymc copied to clipboard
Numpy emits spurious cast warning on discrete models with missing data
Description
Our canonical change point example model now emits a warning about invalid casting:
import pandas as pd
import pymc as pm
disaster_data = pd.Series(
[4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, np.nan, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, np.nan, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1]
)
# fmt: on
years = np.arange(1851, 1962)
with pm.Model() as disaster_model:
switchpoint = pm.DiscreteUniform("switchpoint", lower=years.min(), upper=years.max())
early_rate = pm.Exponential("early_rate", 1.0, initval=3)
late_rate = pm.Exponential("late_rate", 1.0, initval=1)
rate = pm.math.switch(switchpoint >= years, early_rate, late_rate)
disasters = pm.Poisson("disasters", rate, observed=disaster_data)
This is due to the conversion to masked_array
that happens here. The warning can be easily reproduced in isolation:
import numpy as np
np.ma.masked_array(np.array([1, 2, np.nan])).astype(int)
This caused a test to fail in the pymc-examples CI, but more importantly it's ugly for users. Is the juice from masked_array
worth the squeeze? It seems like it would be less hassle to just make the mask, fill the array ourselves, and pass them around together.
In #7277 we propose to add an dtype
kwarg to convert_observed_data
. This can be passed via np.ma.masked_array(..., dtype=dtype)
to get the correct dtype without casting.