features_selection function is returning same features for different target classes
Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes?
#Feature Selection
X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)
Output:
features selection: from 10,000 to 7,026
# Curate:
. selected features: 7026
. top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
# Discard:
. selected features: 7026
. top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
df_selection[df_selection['feature'] == 'protein']
| feature | score | y |
|---|---|---|
| protein | 1.0 | Curate |
| protein | 1.0 | Discard |
Hi, contact me on Linkedin please, I'll try to help you
On Tue, 3 May 2022 at 12:09, Mathew Kevin @.***> wrote:
Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes?
#Feature Selection X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)
Output:
features selection: from 10,000 to 7,026
Curate:
. selected features: 7026 . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
Discard:
. selected features: 7026 . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
df_selection[df_selection['feature'] == 'protein']
feature score y protein 1.0 Curate protein 1.0 Discard
— Reply to this email directly, view it on GitHub https://github.com/mdipietro09/DataScience_ArtificialIntelligence_Utils/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHUTRVUEOX64XORTHWBPHNTVID3PJANCNFSM5U6OPNLA . You are receiving this because you are subscribed to this thread.Message ID: @.*** .com>