understanding-ai
understanding-ai copied to clipboard
Local sparsity control for Naive Bayes with extreme misclassiication costs
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.5667&rep=rep1&type=pdf
1. Introduction
- In text domain, there is excessive number of features
- To control sparsity, "threshold to cut-off feature" was used traditionally
- This paper suggests that local approaches (feature selection) has potential benefit
4. Sparsity control via feature selection
- Global sparsity cut-off (feature ranking) is better than feature count cut-off
- Cannot say local approach is always better than global approach but seems to better on many cases
6. Datasets
6.2. Model comparision
- NBLOC is best
7. Results
8. Conclusions
- standard Naive Bayes classifier has propensity to make errors with high confidence
- especially in the text domain where overconfidence can come from large dimensionality of the feature
- paper claims to use local approach and document-specific approach
- local feature selection may preferable which dataset and feature ranking functions are considered
- Naive Bayes could perform better with document-specific feature selection at cost settings
- paper shows Naive Bayes with document length normalization and TFIDF term weighting makes benefit