evidently icon indicating copy to clipboard operation
evidently copied to clipboard

fix: Drop NaNs only in used columns

Open danieljmv01 opened this issue 2 years ago • 2 comments

  • Select columns used in CatTargetDriftAnalyzer and then filter Nans and infinities.
  • Stop doing the replace and drop of these values inplace.
  • Check that columns are not empty and raise a more informative error if so.

Closes #241

Signed-off-by: Daniel J. Morales Velásquez [email protected]

danieljmv01 avatar May 30 '22 21:05 danieljmv01

Workflow log before creating the PR: https://github.com/danieljmv01/evidently/actions/runs/2411394304

danieljmv01 avatar May 30 '22 21:05 danieljmv01

Rebased and updated _remove_nans_and_infinities to just keep valid values without replacing data, just in case : https://github.com/evidentlyai/evidently/blob/fa546f78877b4242a4d5acf482951af9095e9718/src/evidently/pipeline/pipeline.py#L38-L40

Log: https://github.com/danieljmv01/evidently/actions/runs/2415156752

danieljmv01 avatar May 31 '22 13:05 danieljmv01

Hi @danieljmv01 , first of all we want to thank you for your pull request. You brought up very important topic and heavily increased the priority of NA values filtering.

We have carefully reviewed you changes (we really like it!), and finally decided to go with the same filtration logic, but alternative technical implementation without the use of masks. This was mainly dictated by the future changes we are aiming to make and ease of support.

emeli-dral avatar Sep 20 '22 08:09 emeli-dral