TuRF value error
Hello,
I am currently trying to use TuRF to get my feature importance scores, and my code is almost the same as the example code in the docs:
from skrebate.turf import TuRF
# Take x & y from dataframes
X = x.values
Y = y.values
# Take feature names as header
header = x.columns
# Implement TuRF with ReliefF as algorithm
tf = TuRF(core_algorithm="ReliefF", n_features_to_select=2, pct=0.5,verbose=True)
tf.fit(X, Y, header)
# Output
Created distance array in 0.03900003433227539 seconds.
Feature scoring under way ...
Completed scoring in 12.943000078201294 seconds.
Created distance array in 0.02700018882751465 seconds.
Feature scoring under way ...
Completed scoring in 6.190999984741211 seconds.
Created distance array in 0.004999876022338867 seconds.
Feature scoring under way ...
Completed scoring in 3.2160000801086426 seconds.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-63-e796aad72373> in <module>()
1 tf = TuRF(core_algorithm="ReliefF", n_features_to_select=2, pct=0.5,verbose=True)
----> 2 tf.fit(X, Y, header)
C:\ProgramData\Anaconda3\lib\site-packages\skrebate\turf.py in fit(self, X, y, headers)
164 self.feature_importances_.append(low_score - reduction * self._lost[i]) #append discounted score as a marker of when the feature was removed.
165 else: #Feature made final cut
--> 166 score_index = self.headers.index(i)
167 self.feature_importances_.append(core_fit.feature_importances_[score_index])
168
ValueError: 'mean_surface_score' is not in list
A very odd error in my opinion since I am certain all feature names are in the header. Anyone knows the solution to this error? Unfortunately i cannot supply my data but it is just a dataframe with about 150 samples, a certain number of features as columns and one column with the labels (X does not contain this column).
Thanks!
Thanks for the issue report, i'll check this out and get back to you asap.
Hi @BBeuker , Did you find any solution for this ?
Is it possible to share the data you are trying to analyze so we can recreate and track down the issue? How many features are in your dataset?
Ryan
Ryan J. Urbanowicz, Ph.D. Assistant Professor of Informatics Perelman School of Medicine University of Pennsylvania
629 Blockley Hall 423 Guardian Drive University of Pennsylvania Philadelphia, Pennsylvania 19104
W. Phone: 215-746-4225 C. Phone: 802-299-9461 Web: www.ryanurbanowicz.comhttp://www.ryanurbanowicz.com/ Twitter: www.twitter.com/DocUrbshttp://www.twitter.com/DocUrbs
From: swatisaini [email protected] Sent: Thursday, August 9, 2018 10:05:22 AM To: EpistasisLab/scikit-rebate Cc: Ryan Urbanowicz; Comment Subject: Re: [EpistasisLab/scikit-rebate] TuRF value error (#52)
Hi @BBeukerhttps://github.com/BBeuker , Did you find any solution for this ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411769515, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANWn0bVlQx64CRen80a5-2W0wYLGq_X3ks5uPEGigaJpZM4UpWKy.
My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.
Hi Ryan,
I have around 1500 features and all are binary values. And It's biological and chemical descriptor dataset for adverse drug prediction.
On Thu, 9 Aug 2018, 9:07 pm Ryan Urbanowicz, [email protected] wrote:
My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411801243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZ62aQbHWkCRjNuGF0usrGPS8fQiBY5ks5uPFcvgaJpZM4UpWKy .
any chance of sharing the data and complete code you ran for us to track down the issue?
Ryan J. Urbanowicz, Ph.D. Assistant Professor of Informatics Perelman School of Medicine University of Pennsylvania
629 Blockley Hall 423 Guardian Drive University of Pennsylvania Philadelphia, Pennsylvania 19104
W. Phone: 215-746-4225 C. Phone: 802-299-9461 Web: www.ryanurbanowicz.comhttp://www.ryanurbanowicz.com/ Twitter: www.twitter.com/DocUrbshttp://www.twitter.com/DocUrbs
From: swatisaini [email protected] Sent: Thursday, August 9, 2018 1:21:55 PM To: EpistasisLab/scikit-rebate Cc: Ryan Urbanowicz; Comment Subject: Re: [EpistasisLab/scikit-rebate] TuRF value error (#52)
Hi Ryan,
I have around 1500 features and all are binary values. And It's biological and chemical descriptor dataset for adverse drug prediction.
On Thu, 9 Aug 2018, 9:07 pm Ryan Urbanowicz, [email protected] wrote:
My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411801243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZ62aQbHWkCRjNuGF0usrGPS8fQiBY5ks5uPFcvgaJpZM4UpWKy .
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411833558, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANWn0fvV8NKf5mx4t8a0i0Rvv7OQPDVhks5uPG-zgaJpZM4UpWKy.
Hi Ryan,
I aplolozise that I won't be able to share the data because of confidentiality issue .
Please find the following link from which I had implemented the code.
https://epistasislab.github.io/scikit-rebate/using/#general-usage-guidelines https://www.google.com/url?q=https://epistasislab.github.io/scikit-rebate/using/%23general-usage-guidelines&sa=D&source=hangouts&ust=1533963780408000&usg=AFQjCNGNBkjs4Rj9t_AKQCRQJ8fCnMJkeA
On Thu, 9 Aug 2018, 11:02 pm Ryan Urbanowicz, [email protected] wrote:
any chance of sharing the data and complete code you ran for us to track down the issue?
Ryan J. Urbanowicz, Ph.D. Assistant Professor of Informatics Perelman School of Medicine University of Pennsylvania
629 Blockley Hall 423 Guardian Drive University of Pennsylvania Philadelphia, Pennsylvania 19104
W. Phone: 215-746-4225 C. Phone: 802-299-9461 Web: www.ryanurbanowicz.comhttp://www.ryanurbanowicz.com/ Twitter: www.twitter.com/DocUrbshttp://www.twitter.com/DocUrbs
From: swatisaini [email protected] Sent: Thursday, August 9, 2018 1:21:55 PM To: EpistasisLab/scikit-rebate Cc: Ryan Urbanowicz; Comment Subject: Re: [EpistasisLab/scikit-rebate] TuRF value error (#52)
Hi Ryan,
I have around 1500 features and all are binary values. And It's biological and chemical descriptor dataset for adverse drug prediction.
On Thu, 9 Aug 2018, 9:07 pm Ryan Urbanowicz, [email protected] wrote:
My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411801243 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AHZ62aQbHWkCRjNuGF0usrGPS8fQiBY5ks5uPFcvgaJpZM4UpWKy
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411833558>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ANWn0fvV8NKf5mx4t8a0i0Rvv7OQPDVhks5uPG-zgaJpZM4UpWKy
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/scikit-rebate/issues/52#issuecomment-411836639, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZ62SnqzYkwwgESicEE5N01OeXkRW0Gks5uPHIqgaJpZM4UpWKy .
I'm having the same problem. Pretty sure it's the same as reported in #54 as well. On line 133 in turf.py, non_select is not the complement of select, if pct = 0.5 and the number of features is odd. If pct != 0.5, I think you would get the crash regardless.