FIDDLE
FIDDLE copied to clipboard
Error when not discretizing MIMIC-III time-series data - TypeError: bad operand type for unary ~: 'float'
I am running FIDDLE on data extracted from MIMIC-III using the pipeline outlined in FIDDLE-experiments. I have my population of ICU stays and am running FIDDLE with these parameters:
--T=240.0 --dt=1.0 --theta_1=0.003 --theta_2=0.003 --theta_freq=1 --stats_functions 'mean'
and other default ones found in run_make_all.sh
.
I get the following error:
Traceback (most recent call last):
File "/home/hodgman/miniconda3/envs/FIDDLE-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/hodgman/miniconda3/envs/FIDDLE-env/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hodgman/FIDDLE-experiments/FIDDLE/FIDDLE/run.py", line 141, in <module>
main()
File "/home/hodgman/FIDDLE-experiments/FIDDLE/FIDDLE/run.py", line 138, in main
X, X_feature_names, X_feature_aliases = FIDDLE_steps.process_time_dependent(df_time_series, args)
File "/home/hodgman/FIDDLE-experiments/FIDDLE/FIDDLE/steps.py", line 244, in process_time_dependent
X_all, X_all_feature_names, X_discretization_bins = map_time_series_features(df_time_series, dtypes_time_series, args)
File "/home/hodgman/FIDDLE-experiments/FIDDLE/FIDDLE/steps.py", line 604, in map_time_series_features
df.loc[~numeric_mask, col] = np.nan
File "/home/hodgman/miniconda3/envs/FIDDLE-env/lib/python3.7/site-packages/pandas/core/generic.py", line 1532, in __invert__
new_data = self._mgr.apply(operator.invert)
File "/home/hodgman/miniconda3/envs/FIDDLE-env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 325, in apply
applied = b.apply(f, **kwargs)
File "/home/hodgman/miniconda3/envs/FIDDLE-env/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 381, in apply
result = func(self.values, **kwargs)
TypeError: bad operand type for unary ~: 'float'
Do you know what could be causing this error? I was able to determine that it first occurs in the column 225958 and numeric_mask
contains at least one NaN value which must mean column 225958 contains None
values however in in my input_data.p
file there are no None
or NaN variable_values
for variable_name == '225958'
.
Hello, the numeric_mask
is generated from the is_numeric
function in helpers.py
:
https://github.com/MLD3/FIDDLE/blob/master/FIDDLE/helpers.py#L191
on this line:
https://github.com/MLD3/FIDDLE/blob/master/FIDDLE/steps.py#L601
I agree with your logic, so it is indeed surprising if input_data.p
does not contain None
/NaN
but numeric_mask
contains NaN
. Perhaps you could try with a small example with/without nans and apply the is_numeric
function to that column?
is_numeric
works when I extract the 225958 feature column from input_data.p
to col_data
and run
numeric_mask = col_data.apply(is_numeric)
numeric_mask
only contains True
and False
values. When I switch one of these booleans to np.nan
or a float
I can reproduce the error. I'm going to see if I can extract the ts_mixed
dataframe from https://github.com/MLD3/FIDDLE/blob/master/FIDDLE/steps.py#L594 and look at feature 225958.