TextFeatureSelection
TextFeatureSelection copied to clipboard
'CountVectorizer' object has no attribute 'get_feature_names'
from TextFeatureSelection import TextFeatureSelection #Binary classification input_doc_list=new_df_4['txt'].values.tolist() target=new_df_4['target'].values.tolist() fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list) result_df=fsOBJ.getScore() print(result_df)
That's my code and the error: `--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell 102 in 7 5 target=new_df_4['target'].values.tolist() 6 fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list) ----> 7 result_df=fsOBJ.getScore() 8 print(result_df)
File /opt/homebrew/lib/python3.10/site-packages/TextFeatureSelection.py:409, in TextFeatureSelection.getScore(self) 407 else: 408 if len(set(self.target))==2: --> 409 values_df=self._getvalues_singleclass() 410 return values_df 411 elif len(set(self.target))>2:
File /opt/homebrew/lib/python3.10/site-packages/TextFeatureSelection.py:268, in TextFeatureSelection._getvalues_singleclass(self) 265 label_array=self._get_binary_label(self.target) 267 #get word, count, binary matrix --> 268 word_list,count_list,word_binary_matrix=self._get_term_binary_matrix(self.input_doc_list) 270 #get ABCDN 271 A,B,C,D,N=self._get_ABCD(word_binary_matrix,label_array)
File /opt/homebrew/lib/python3.10/site-packages/TextFeatureSelection.py:231, in TextFeatureSelection._get_term_binary_matrix(self, input_doc_list) 229 vectorizer = CountVectorizer() 230 X = vectorizer.fit_transform(input_doc_list) --> 231 word_list = vectorizer.get_feature_names() 233 #binary word document matrix 234 vectorizer = CountVectorizer(binary=True)
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'`
Please suggest the version on your computer for the scikit-learn library.
I got scikit-learn 1.2.2
some people suggest to change the get_feature_names into get_feature_names_out
If you can deprecate scikit-learn and use older version it will work.
i try to calculate and use the library but the number of information gain value is so different, can you tell me how you implement the equation of information gain in your library?