pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Numpy emits spurious cast warning on discrete models with missing data

Open jessegrabowski opened this issue 10 months ago • 1 comments

Description

Our canonical change point example model now emits a warning about invalid casting:

import pandas as pd
import pymc as pm
disaster_data = pd.Series(
    [4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
     3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
     2, 2, 3, 4, 2, 1, 3, np.nan, 2, 1, 1, 1, 1, 3, 0, 0,
     1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
     0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
     3, 3, 1, np.nan, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
     0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1]
)
# fmt: on
years = np.arange(1851, 1962)

with pm.Model() as disaster_model:
    switchpoint = pm.DiscreteUniform("switchpoint", lower=years.min(), upper=years.max())
    early_rate = pm.Exponential("early_rate", 1.0, initval=3)
    late_rate = pm.Exponential("late_rate", 1.0, initval=1)
    rate = pm.math.switch(switchpoint >= years, early_rate, late_rate)
    disasters = pm.Poisson("disasters", rate, observed=disaster_data)

This is due to the conversion to masked_array that happens here. The warning can be easily reproduced in isolation:

import numpy as np
np.ma.masked_array(np.array([1, 2, np.nan])).astype(int)

This caused a test to fail in the pymc-examples CI, but more importantly it's ugly for users. Is the juice from masked_array worth the squeeze? It seems like it would be less hassle to just make the mask, fill the array ourselves, and pass them around together.

jessegrabowski avatar Apr 15 '24 10:04 jessegrabowski

In #7277 we propose to add an dtype kwarg to convert_observed_data. This can be passed via np.ma.masked_array(..., dtype=dtype) to get the correct dtype without casting.

michaelosthege avatar May 02 '24 14:05 michaelosthege