data_science_in_julia_for_hackers icon indicating copy to clipboard operation
data_science_in_julia_for_hackers copied to clipboard

Chap 4 suggestions

Open salbert83 opened this issue 3 years ago • 3 comments

Another excellent chapter! Some minor suggestions:

  1. in spam_predict, replace [wrd for wrd in email_words if wrd in vocabulary] with intersect(email_words, model.vocabulary). Also, I think the code on the webpage has a typo (vocabulary instead of model.vocabulary).
  2. Use log(probability) instead of probability to avoid numerical errors
  3. In spam_filter_accuracy, why record predictions? It is unused and not returned.

Thank you.

salbert83 avatar Apr 10 '21 11:04 salbert83

One more, replace all_words_text = StringDocument(string([string(word, " ") for word in all_words]...)) with all_words_text = StringDocument(join(all_words, " "))

Thanks

salbert83 avatar Apr 10 '21 11:04 salbert83

Also, think you're missing the line to load the file, something like raw_df = DataFrame(CSV.File(email_path))

Thanks

salbert83 avatar Apr 10 '21 13:04 salbert83

Hi @salbert83 ! Thanks for the suggestions! We are currently working on an updated of chapter 4, so this comments will help us a lot You are right in:

  • the raw_df = DataFrame(CSV.File(email_path)) line is missing
  • We have some typos errors
  • We are not using the predictions array

We are going to use the code suggestions! In a few day we will upload the new chapter version and you will see the improvements.

Thank you very much!

pefontana avatar Apr 12 '21 14:04 pefontana