understanding-ai icon indicating copy to clipboard operation
understanding-ai copied to clipboard

Local sparsity control for Naive Bayes with extreme misclassiication costs

Open flrngel opened this issue 5 years ago • 0 comments

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.5667&rep=rep1&type=pdf

1. Introduction

  • In text domain, there is excessive number of features
  • To control sparsity, "threshold to cut-off feature" was used traditionally
  • This paper suggests that local approaches (feature selection) has potential benefit

4. Sparsity control via feature selection

  • Global sparsity cut-off (feature ranking) is better than feature count cut-off
  • Cannot say local approach is always better than global approach but seems to better on many cases

6. Datasets

6.2. Model comparision

  • NBLOC is best image image

7. Results

image

8. Conclusions

  • standard Naive Bayes classifier has propensity to make errors with high confidence
    • especially in the text domain where overconfidence can come from large dimensionality of the feature
  • paper claims to use local approach and document-specific approach
  • local feature selection may preferable which dataset and feature ranking functions are considered
  • Naive Bayes could perform better with document-specific feature selection at cost settings
  • paper shows Naive Bayes with document length normalization and TFIDF term weighting makes benefit

flrngel avatar Sep 12 '18 06:09 flrngel