data_science_in_julia_for_hackers Chap 4 suggestions

Chap 4 suggestions

Open salbert83 opened this issue 3 years ago • 3 comments

Another excellent chapter! Some minor suggestions:

in spam_predict, replace [wrd for wrd in email_words if wrd in vocabulary] with intersect(email_words, model.vocabulary). Also, I think the code on the webpage has a typo (vocabulary instead of model.vocabulary).
Use log(probability) instead of probability to avoid numerical errors
In spam_filter_accuracy, why record predictions? It is unused and not returned.

Thank you.

Apr 10 '21 11:04 salbert83

One more, replace all_words_text = StringDocument(string([string(word, " ") for word in all_words]...)) with all_words_text = StringDocument(join(all_words, " "))

Thanks

Apr 10 '21 11:04 salbert83

Also, think you're missing the line to load the file, something like raw_df = DataFrame(CSV.File(email_path))

Thanks

Apr 10 '21 13:04 salbert83

Hi @salbert83 ! Thanks for the suggestions! We are currently working on an updated of chapter 4, so this comments will help us a lot You are right in:

the raw_df = DataFrame(CSV.File(email_path)) line is missing
We have some typos errors
We are not using the predictions array

We are going to use the code suggestions! In a few day we will upload the new chapter version and you will see the improvements.

Thank you very much!

Apr 12 '21 14:04 pefontana

data_science_in_julia_for_hackers data_science_in_julia_for_hackers copied to clipboard

Chap 4 suggestions

data_science_in_julia_for_hackers
data_science_in_julia_for_hackers copied to clipboard