Local sparsity control for Naive Bayes with extreme misclassiication costs

Open flrngel opened this issue 5 years ago • 0 comments

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.5667&rep=rep1&type=pdf

1. Introduction

In text domain, there is excessive number of features
To control sparsity, "threshold to cut-off feature" was used traditionally
This paper suggests that local approaches (feature selection) has potential benefit

Global sparsity cut-off (feature ranking) is better than feature count cut-off
Cannot say local approach is always better than global approach but seems to better on many cases

standard Naive Bayes classifier has propensity to make errors with high confidence
- especially in the text domain where overconfidence can come from large dimensionality of the feature
paper claims to use local approach and document-specific approach
local feature selection may preferable which dataset and feature ranking functions are considered
Naive Bayes could perform better with document-specific feature selection at cost settings
paper shows Naive Bayes with document length normalization and TFIDF term weighting makes benefit

Sep 12 '18 06:09 flrngel