data_science_in_julia_for_hackers
data_science_in_julia_for_hackers copied to clipboard
Chap 4 suggestions
Another excellent chapter! Some minor suggestions:
- in spam_predict, replace [wrd for wrd in email_words if wrd in vocabulary] with intersect(email_words, model.vocabulary). Also, I think the code on the webpage has a typo (vocabulary instead of model.vocabulary).
- Use log(probability) instead of probability to avoid numerical errors
- In spam_filter_accuracy, why record predictions? It is unused and not returned.
Thank you.
One more, replace all_words_text = StringDocument(string([string(word, " ") for word in all_words]...)) with all_words_text = StringDocument(join(all_words, " "))
Thanks
Also, think you're missing the line to load the file, something like raw_df = DataFrame(CSV.File(email_path))
Thanks
Hi @salbert83 ! Thanks for the suggestions! We are currently working on an updated of chapter 4, so this comments will help us a lot You are right in:
- the raw_df = DataFrame(CSV.File(email_path)) line is missing
- We have some typos errors
- We are not using the predictions array
We are going to use the code suggestions! In a few day we will upload the new chapter version and you will see the improvements.
Thank you very much!