nltools icon indicating copy to clipboard operation
nltools copied to clipboard

Refactor ROC module

Open ljchang opened this issue 7 years ago • 3 comments

ROC plot has been having a lot of problems. Right now forced choice accuracy doesn't seem to be always correct.

We should refactor this and write proper tests.

Also need to address balanced accuracy p-value at some point (try permutations)

ljchang avatar Nov 27 '17 22:11 ljchang

forced choice test might be impacted by this commit 3bb8db388abc887b35504a37ff29daa9e33db8a7 by @ljchang

ljchang avatar Nov 27 '17 23:11 ljchang

In case this is helpful for this, I noticed that the input type of the data silently gives different results for the same data (see example below). I think the input variables should be explicitly coerced into a specific type or raise an error if not of the expected type to avoid these issues.

I get different results for each of these examples:

from nltools.analysis import Roc
import numpy as np
import pandas as pd

inputs = np.array([1, 2, 1, 2, 2, 1, 1, 2])
outcomes = np.array([0, 1, 0, 1, 0, 1, 0, 1])
subs = np.array([1, 1, 2, 2, 3, 3, 4, 4])

# With int outcomes
roc = Roc(inputs, outcomes)
roc.calculate()
roc.summary()

# With numpy boolean outcomes
outcomes = outcomes.astype(bool)
roc = Roc(inputs, outcomes)
roc.calculate()
roc.summary()

# Forced choice
# With int inputs
roc = Roc(input_values=inputs,
          binary_outcome=outcomes,
          forced_choice=subs)
roc.calculate()
roc.summary()

# With float inputs
roc = Roc(input_values=inputs.astype(float),
          binary_outcome=outcomes,
          forced_choice=subs)
roc.calculate()
roc.summary()

# With pd Series outcomes
roc = Roc(input_values=inputs.astype(float),
          binary_outcome=pd.Series(outcomes.astype(bool)),
          forced_choice=subs)
roc.calculate()
roc.summary()

mpcoll avatar Apr 01 '21 18:04 mpcoll

Thanks for this. We are planning to do a major refactor to this module soon as it is a mess.

ljchang avatar Apr 01 '21 18:04 ljchang