tsfresh icon indicating copy to clipboard operation
tsfresh copied to clipboard

Update tsfresh.feature_extraction.feature_calculators.skewness to make it consistent with the design principle of not ignoring nan

Open yitao-yu opened this issue 1 year ago • 3 comments

The current implementation of tsfresh.feature_extraction.feature_calculators.skewness utilizes a function from pandas that ignores nan by default, which is not consistent with the performance of other functions and the design principle.

This update provides a fix by adding a skipna=False argument. But since it's discovered by chance, I do suggest a review of other parts of the code.

More

yitao-yu avatar Oct 06 '23 08:10 yitao-yu

Codecov Report

All modified lines are covered by tests :white_check_mark:

Comparison is base (f3a6a7c) 93.43% compared to head (1ab70aa) 93.43%. Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1051   +/-   ##
=======================================
  Coverage   93.43%   93.43%           
=======================================
  Files          20       20           
  Lines        1888     1888           
  Branches      372      372           
=======================================
  Hits         1764     1764           
  Misses         85       85           
  Partials       39       39           
Files Coverage Δ
tsfresh/feature_extraction/feature_calculators.py 92.46% <100.00%> (ø)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 09 '23 21:10 codecov-commenter

Thanks @yitao-yu - you are completely correct! It would be nice to also check all other calculators. Do you have an idea on how to do this automatically?

So after much closer looks, I believe that the package would throw an error(there is a _check_nan function called in the WideTsFrameAdapter class) if you are utilizing tsfresh.feature_extraction.feature_calculators.

For individual features, maybe some sort of testing program can be written to check if they have similar problems. I'm not really sure how to do that.

Or an easier alternative is that I can just write programs to look for features that are calling functions that are implemented under pandas.Series and also has an optional argument skipna.

yitao-yu avatar Oct 17 '23 15:10 yitao-yu

Sorry for the long pause @yitao-yu - and thanks for looking into it. I think for now, including your change for this function is a good start. Could you fix the code style issue? The we can merge!

nils-braun avatar Oct 24 '23 19:10 nils-braun

Merged in #1066

nils-braun avatar Mar 11 '24 18:03 nils-braun