SDV
SDV copied to clipboard
Update code to remove pandas `FutureWarning` messages that's displayed for each row during conditional sampling
Environment Details
SDV 1.12
Problem Description
In our current implementation, conditional sampling thousands of rows generates thousands of pandas FutureWarning messages. This can actually crash Jupyter Notebook / Lab webpages sometimes (as I've experienced).
FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated.
In a future version, replace will only be used for cases that preserve the categories.
To change the categories, use ser.cat.rename_categories instead.
result = result.replace(nan_name, np.nan)
Expected behavior
That we update the pandas methods we're using so these warnings aren't displayed.
Steps to Reproduce
import pandas as pd
from sdv.single_table import GaussianCopulaSynthesizer
from sdv.datasets.demo import download_demo
data, metadata = download_demo(
modality='single_table',
dataset_name='census_extended'
)
synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample_remaining_columns(data[['sex', 'income']].head(10))
Workaround
As an SDV user, you can squash these warnings for now while we resolve the underlying issue:
# Run before importing pandas
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pandas as pd