An-Amharic-News-Text-classification-Dataset
An-Amharic-News-Text-classification-Dataset copied to clipboard
Improve Accuracy of The Model
I have run your code you have implemented naive_bayes.GaussianNB estimator resulted with 0.6
accuracy. It is a suitable estimator for text data. However, since you converted the text values in to a numeric form you can use other estimators such as LogisticRegression or Linear SVM for a better accuracy. I have added LogesticRegression model to your code with out any other modification and I have got around 0.79
accuracy.
In addition, you can improve the accuracy by:
- Removing nan and null values from the dataset.
- For feature extraction, use only
headline
,category
andarticle
as your data as the rest of the columns are not necessary.
Good job @meuzgebre , we released this dataset so many people would work towards improving this accuracy. we would like to see this result you talked about become the SOTA algorithm for this dataset here.
if you have some writeup and updated code we are happy to mention it in this readme.
Hey @IsraelAbebe checkout my pull requests.
@meuzgebre can you send a pull request to the new branch I created for you , I would like to put it there and edit the readme.