data_science_in_julia_for_hackers icon indicating copy to clipboard operation
data_science_in_julia_for_hackers copied to clipboard

Tracking issue: List of pending technical details to fix in chapters

Open entropidelic opened this issue 1 year ago • 0 comments

Chapter 4:

  • Split into two paragraphs, maybe extend the first on First, we would like to filter some words that are very common in the English language, such as articles and pronouns, which will most likely add noise rather than information to our classification algorithm. For this we will use two Julia packages that are specially designed for working with texts of any type. These are Languages.jl and TextAnalysis.jl.

A good practice when dealing with models that learn from data like the one we are going to implement, is to divide our data into two: a training set and a testing set. We need to measure how good our model is performing, so we will train it with some data, and test it with some other data the model has never seen. This way we may be sure that the model is not tricking us. In Julia, the package MLDataUtils has some nice functionalities for data manipulations like this. We will use the functions splitobs to split our dataset in a train set and a test set and shuffleobs to randomize the order of our data in the split. It is important also to pass a labels array to our split function so that it knows how to properly split our dataset.

Note: I have the impression there are many places where sentences should have been split into two or more paragraphs. Maybe this is a rendering issue and the sentences are separated in the source?


  • Explain formulas in one or two sentences, consider making crossreference to the section in chapter 2.

The probability of finding a particular word in an email, given that we have a spam email, can be calculated like so:


  • How to compute the priors is not explained in the text

Chapter 6

  • Out of the blue

The sentence "So, the model we are going to propose is a linear regression. A linear equation has the form:"

is not well connected with the previous paragraphs. It is not clear why we need a linear model at all. The transition from mechanics to statistics should be smoother

Maybe you can invert the order of the story

We want to scape from mars
We need to find out the scape velocity
For that we need to find g_mars
We realize x = f(g, t), so we can throw stones to find g!
But measurement are noisy, hence we create the model x ~ Normal(f(g, t), σ)
We now try to find f from what we remember from high-school physics.
We got it! we collect a few datapoints
We explore a few priors.

  • Justify Priors

Say something like know it has to be positive and and less than g_earth, which is 9.8, and can round up to 10.

Do the same for the other two priors


  • Discuss the posterior with angle uncertainty vs the one without it.

Consider a new plot with them side by side or use overlapped. Discuss mean and standard deviation or HDI (or some other interval)

entropidelic avatar Apr 04 '23 14:04 entropidelic