scikit-rebate icon indicating copy to clipboard operation
scikit-rebate copied to clipboard

TuRF value error

Open J-Bleker opened this issue 7 years ago • 8 comments

Hello,

I am currently trying to use TuRF to get my feature importance scores, and my code is almost the same as the example code in the docs:

from skrebate.turf import TuRF
​
# Take x & y from dataframes
X = x.values
Y = y.values
​
# Take feature names as header
header = x.columns

# Implement TuRF with ReliefF as algorithm
tf = TuRF(core_algorithm="ReliefF", n_features_to_select=2, pct=0.5,verbose=True)
tf.fit(X, Y, header)

# Output

Created distance array in 0.03900003433227539 seconds.
Feature scoring under way ...
Completed scoring in 12.943000078201294 seconds.
Created distance array in 0.02700018882751465 seconds.
Feature scoring under way ...
Completed scoring in 6.190999984741211 seconds.
Created distance array in 0.004999876022338867 seconds.
Feature scoring under way ...
Completed scoring in 3.2160000801086426 seconds.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-63-e796aad72373> in <module>()
      1 tf = TuRF(core_algorithm="ReliefF", n_features_to_select=2, pct=0.5,verbose=True)
----> 2 tf.fit(X, Y, header)

C:\ProgramData\Anaconda3\lib\site-packages\skrebate\turf.py in fit(self, X, y, headers)
    164                 self.feature_importances_.append(low_score - reduction * self._lost[i]) #append discounted score as a marker of when the feature was removed.
    165             else: #Feature made final cut
--> 166                 score_index = self.headers.index(i)
    167                 self.feature_importances_.append(core_fit.feature_importances_[score_index])
    168 

ValueError: 'mean_surface_score' is not in list

A very odd error in my opinion since I am certain all feature names are in the header. Anyone knows the solution to this error? Unfortunately i cannot supply my data but it is just a dataframe with about 150 samples, a certain number of features as columns and one column with the labels (X does not contain this column).

Thanks!

J-Bleker avatar Jun 15 '18 10:06 J-Bleker

Thanks for the issue report, i'll check this out and get back to you asap.

ryanurbs avatar Jun 15 '18 19:06 ryanurbs

Hi @BBeuker , Did you find any solution for this ?

swatisaini avatar Aug 09 '18 14:08 swatisaini

Is it possible to share the data you are trying to analyze so we can recreate and track down the issue? How many features are in your dataset?

Ryan

Ryan J. Urbanowicz, Ph.D. Assistant Professor of Informatics Perelman School of Medicine University of Pennsylvania

629 Blockley Hall 423 Guardian Drive University of Pennsylvania Philadelphia, Pennsylvania 19104

W. Phone: 215-746-4225 C. Phone: 802-299-9461 Web: www.ryanurbanowicz.comhttp://www.ryanurbanowicz.com/ Twitter: www.twitter.com/DocUrbshttp://www.twitter.com/DocUrbs


From: swatisaini [email protected] Sent: Thursday, August 9, 2018 10:05:22 AM To: EpistasisLab/scikit-rebate Cc: Ryan Urbanowicz; Comment Subject: Re: [EpistasisLab/scikit-rebate] TuRF value error (#52)

Hi @BBeukerhttps://github.com/BBeuker , Did you find any solution for this ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411769515, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANWn0bVlQx64CRen80a5-2W0wYLGq_X3ks5uPEGigaJpZM4UpWKy.

ryanurbs avatar Aug 09 '18 15:08 ryanurbs

My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.

ryanurbs avatar Aug 09 '18 15:08 ryanurbs

Hi Ryan,

I have around 1500 features and all are binary values. And It's biological and chemical descriptor dataset for adverse drug prediction.

On Thu, 9 Aug 2018, 9:07 pm Ryan Urbanowicz, [email protected] wrote:

My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411801243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZ62aQbHWkCRjNuGF0usrGPS8fQiBY5ks5uPFcvgaJpZM4UpWKy .

swatisaini avatar Aug 09 '18 17:08 swatisaini

any chance of sharing the data and complete code you ran for us to track down the issue?

Ryan J. Urbanowicz, Ph.D. Assistant Professor of Informatics Perelman School of Medicine University of Pennsylvania

629 Blockley Hall 423 Guardian Drive University of Pennsylvania Philadelphia, Pennsylvania 19104

W. Phone: 215-746-4225 C. Phone: 802-299-9461 Web: www.ryanurbanowicz.comhttp://www.ryanurbanowicz.com/ Twitter: www.twitter.com/DocUrbshttp://www.twitter.com/DocUrbs


From: swatisaini [email protected] Sent: Thursday, August 9, 2018 1:21:55 PM To: EpistasisLab/scikit-rebate Cc: Ryan Urbanowicz; Comment Subject: Re: [EpistasisLab/scikit-rebate] TuRF value error (#52)

Hi Ryan,

I have around 1500 features and all are binary values. And It's biological and chemical descriptor dataset for adverse drug prediction.

On Thu, 9 Aug 2018, 9:07 pm Ryan Urbanowicz, [email protected] wrote:

My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411801243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZ62aQbHWkCRjNuGF0usrGPS8fQiBY5ks5uPFcvgaJpZM4UpWKy .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411833558, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANWn0fvV8NKf5mx4t8a0i0Rvv7OQPDVhks5uPG-zgaJpZM4UpWKy.

ryanurbs avatar Aug 09 '18 17:08 ryanurbs

Hi Ryan,

I aplolozise that I won't be able to share the data because of confidentiality issue .

Please find the following link from which I had implemented the code.

https://epistasislab.github.io/scikit-rebate/using/#general-usage-guidelines https://www.google.com/url?q=https://epistasislab.github.io/scikit-rebate/using/%23general-usage-guidelines&sa=D&source=hangouts&ust=1533963780408000&usg=AFQjCNGNBkjs4Rj9t_AKQCRQJ8fCnMJkeA

On Thu, 9 Aug 2018, 11:02 pm Ryan Urbanowicz, [email protected] wrote:

any chance of sharing the data and complete code you ran for us to track down the issue?

Ryan J. Urbanowicz, Ph.D. Assistant Professor of Informatics Perelman School of Medicine University of Pennsylvania

629 Blockley Hall 423 Guardian Drive University of Pennsylvania Philadelphia, Pennsylvania 19104

W. Phone: 215-746-4225 C. Phone: 802-299-9461 Web: www.ryanurbanowicz.comhttp://www.ryanurbanowicz.com/ Twitter: www.twitter.com/DocUrbshttp://www.twitter.com/DocUrbs


From: swatisaini [email protected] Sent: Thursday, August 9, 2018 1:21:55 PM To: EpistasisLab/scikit-rebate Cc: Ryan Urbanowicz; Comment Subject: Re: [EpistasisLab/scikit-rebate] TuRF value error (#52)

Hi Ryan,

I have around 1500 features and all are binary values. And It's biological and chemical descriptor dataset for adverse drug prediction.

On Thu, 9 Aug 2018, 9:07 pm Ryan Urbanowicz, [email protected] wrote:

My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411801243 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AHZ62aQbHWkCRjNuGF0usrGPS8fQiBY5ks5uPFcvgaJpZM4UpWKy

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411833558>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ANWn0fvV8NKf5mx4t8a0i0Rvv7OQPDVhks5uPG-zgaJpZM4UpWKy

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411836639, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZ62SnqzYkwwgESicEE5N01OeXkRW0Gks5uPHIqgaJpZM4UpWKy .

swatisaini avatar Aug 10 '18 05:08 swatisaini

I'm having the same problem. Pretty sure it's the same as reported in #54 as well. On line 133 in turf.py, non_select is not the complement of select, if pct = 0.5 and the number of features is odd. If pct != 0.5, I think you would get the crash regardless.

Tipulidae avatar Oct 22 '18 22:10 Tipulidae