features_selection function is returning same features for different target classes

Open MathewKevin opened this issue 3 years ago • 2 comments

Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes?

#Feature Selection
X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)

Output:

features selection: from 10,000 to 7,026
 
# Curate:
  . selected features: 7026
  . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
 
# Discard:
  . selected features: 7026
  . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune

df_selection[df_selection['feature'] == 'protein']

feature	score	y
protein	1.0	Curate
protein	1.0	Discard

May 03 '22 10:05 MathewKevin

Hi, contact me on Linkedin please, I'll try to help you

On Tue, 3 May 2022 at 12:09, Mathew Kevin @.***> wrote:

Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes?

#Feature Selection X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)

Output:

features selection: from 10,000 to 7,026

Curate:

. selected features: 7026 . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune

Discard:

. selected features: 7026 . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune

df_selection[df_selection['feature'] == 'protein']

feature score y protein 1.0 Curate protein 1.0 Discard

— Reply to this email directly, view it on GitHub https://github.com/mdipietro09/DataScience_ArtificialIntelligence_Utils/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHUTRVUEOX64XORTHWBPHNTVID3PJANCNFSM5U6OPNLA . You are receiving this because you are subscribed to this thread.Message ID: @.*** .com>

May 10 '22 15:05 mdipietro09