autofeat
autofeat copied to clipboard
Input contains NaN, infinity or a value too large for dtype('float32') on fit_transform
Facing the following issue when running AutoFeatRegressor.fit_transform(featuresDf, targetFeature)
:
Already checked if there are any infinity values or nan. Also, converted everything to float32. Any pointers? Thanks!
Update: tried the same set of inputs with FeatureSelector
and everything is working great.
Update 2: posted this question on StackOverflow
Exception
c:\dox\rnd\ml-pipeline-notebooks\modules\autoFeat.py in augmentFeatures(features, targetFeature, verbose)
41 print(featuresDf.head())
42 print(targetFeature)
---> 43 newFeatureDf = autoFeatRegressor.fit_transform(featuresDf, targetFeature)
44
45 if verbose:
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
112 ):
113 type_err = "infinity" if allow_nan else "NaN, infinity"
--> 114 raise ValueError(
115 msg_err.format(
116 type_err, msg_dtype if msg_dtype is not None else X.dtype
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
Logs (verbose)
From the StackOverflow question it seems the problem was because one of your original input features had a standard deviation of 0 (i.e., all the same values)? This would also explain the RuntimeWarning shown above when dividing by the stddev.
I guess it might make sense to add a check for zero-variance features somewhere and exclude them.
I think you can apply scaling methods on your data. it works for me (prefer to use bounded range scaling, ex: StandardScaler or MinMax) you can try Robust scaling but it may give extreme large values too.
also, it will be good if you make clipping for features before apply scaling, but it will work without it anyway