health-equity-tracker icon indicating copy to clipboard operation
health-equity-tracker copied to clipboard

FutureWarning on `merge` in `cdc_hiv`

Open benhammondmusic opened this issue 2 years ago • 0 comments

Describe the bug

I can't figure out how to fix the code to remove this warning:

FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

By running the pip install pytest command with a - W error flag at the end, you can cause warnings to be treated as errors, which will then cause the full stack trace to print. Doing this, it shows this stack trace, which makes me think the issue is merging the initial empty df (that has only columns) with the subsequent dfs that have columns and rows, but i don't know.

Stack Trace

.venv/lib/python3.9/site-packages/datasources/cdc_hiv.py:223: in write_to_bq
    alls_df = load_atlas_df_from_data_dir(geo_level, all)
.venv/lib/python3.9/site-packages/datasources/cdc_hiv.py:564: in load_atlas_df_from_data_dir
    output_df = output_df.merge(df, how="outer")
.venv/lib/python3.9/site-packages/pandas/core/frame.py:9351: in merge
    return merge(
.venv/lib/python3.9/site-packages/pandas/core/reshape/merge.py:122: in merge
    return op.get_result()
.venv/lib/python3.9/site-packages/pandas/core/reshape/merge.py:738: in get_result
    self._maybe_add_join_keys(result, left_indexer, right_indexer)
.venv/lib/python3.9/site-packages/pandas/core/reshape/merge.py:916: in _maybe_add_join_keys
    key_col = Index(lvals).where(~mask_left, rvals)
.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:494: in __new__
    arr = _maybe_cast_data_without_dtype(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

subarr = array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan], dtype=object), cast_numeric_deprecated = True

    def _maybe_cast_data_without_dtype(
        subarr: np.ndarray, cast_numeric_deprecated: bool = True
    ) -> ArrayLike:
        """
        If we have an arraylike input but no passed dtype, try to infer
        a supported dtype.

        Parameters
        ----------
        subarr : np.ndarray[object]
        cast_numeric_deprecated : bool, default True
            Whether to issue a FutureWarning when inferring numeric dtypes.

        Returns
        -------
        np.ndarray or ExtensionArray
        """

        result = lib.maybe_convert_objects(
            subarr,
            convert_datetime=True,
            convert_timedelta=True,
            convert_period=True,
            convert_interval=True,
            dtype_if_all_nat=np.dtype("datetime64[ns]"),
        )
        if result.dtype.kind in ["i", "u", "f"]:
            if not cast_numeric_deprecated:
                # i.e. we started with a list, not an ndarray[object]
                return result

>           warnings.warn(
                "In a future version, the Index constructor will not infer numeric "
                "dtypes when passed object-dtype sequences (matching Series behavior)",
                FutureWarning,
                stacklevel=3,
            )
E           FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:7137: FutureWarning
============================================================================================== short test summary info ==============================================================================================
FAILED python/tests/datasources/test_cdc_hiv.py::test_write_to_bq_race_national - FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

benhammondmusic avatar Feb 13 '24 21:02 benhammondmusic