data_science_in_julia_for_hackers icon indicating copy to clipboard operation
data_science_in_julia_for_hackers copied to clipboard

Suggestions for Chapter 4

Open marisarabia opened this issue 1 year ago • 0 comments

Chapter 4

  • [x] Replace chapter name: Spam filter By: Spam filter: an introductory classification model

  • [x] Replace: How can Bayes help? By: How can Bayes’ theorem help?

  • [x] Replace section 4.2 name: The Training Data (conceptually, it may generate confusion about if pre-processing covers all the dataset and/or just the training subset after splitting it) By: The Data // The Input Data // The Modelling Data

  • [x] Define what corpus means: parenthesis, footnote.

  • [ ] Key ideas about the pre-processing step may be clarified by adding a few images of what you are seeing, imaging or meaning in order to allow newbies to figure it out and follow the transformation process of data and how it can be read. Maybe this or this help.

  • [ ] It should be explained how you start from a bunch of emails and get them transformed into a kind of tabular record and frequency accountability report in addition to other features and metadata.

  • [x] Replace: probabilties together. By: probabilities together.

  • [ ] In section 4.4, it may help to link with the previous chapter definition of conditional probability and independence in order to reinforce why using a Naïve Bayes classification algorithm as the first approach (simplicity, speed for naïve calculations, usual baseline, etc.). It can also be linked to the appendix 4.9 and Alpha, in order to understand why you are introducing an extra explanation about this parameter.

  • [x] Regarding accuracy indicator for model evaluation, it may be mentioned that it is also a first approach measure, and other indicators are key, especially when working with unbalanced multiclass data.

marisarabia avatar Feb 27 '23 19:02 marisarabia