pml-book Typos (book 1, version 2023-06-23)

Fig. 19.1: a image images

19.2.3: an a variety

19.3.5: initialize set

Fig. 19.11: lossses lossses

model predictions class 0

19.3.6: semi-supervised without a noun for semi-supervised

20.2.6 and other places: 20 2 6

Nitpicking, but the book uses American spelling throughout, except for analysers. (And except for Fig.20.11.)

20.2.6: “let latent indicator specifying…” better be specify let specifying

In “In the special case that”, “that” is superfluous

Eq. (20.89), (20.90): $\mathbf B$ in equations, $\mathbf V$ in description B V

Eq. (20.92): Should $\mathbf B$ really be absent? 20 92

Direct application of (3.38) results in

$$ \begin{bmatrix} \mathbf W_x \mathbf W_x^{\mathrm T} + \mathbf B_x \mathbf B_x^{\mathrm T} & \mathbf W_x \mathbf W_y^{\mathrm T} \ \mathbf W_y \mathbf W_x^{\mathrm T} & \mathbf W_y \mathbf W_y^{\mathrm T} + \mathbf B_y \mathbf B_y^{\mathrm T} \end{bmatrix} $$

Am I doing something wrong?

Eq. (20.111–113) Confusing notation 20 111 20 112 20 113

Is $d_{i, j}$ the same as $d_{ij}$? Should (20.112) really have $\hat d_{ij}^2$ in the denominator, while (20.111) and (20.113) are having plain $d_{ij}^2$?

Eq. (20.133) 20 133

derivable only if $k \ne i$ is removed from both (20.130) and (20.131) 20 130 20 131

Eq. (20.133), (20.136): 20 133 20 136

Is it really $\mathbf z_j - \mathbf z_i$ rather than the other way around? (20.138) is different 20 138

Eq. (20.138): a factor of 4 instead of 2? 20 134-4

20.5.2.2: central (target) word to be predicted 20 5 2 2

Skipgram models predict context rather than the target word.

21.1: data for some data data for data

21.2.3: and g and g

21.4.1.3: In “higher than for the other”, “for” is superfluous

22.2.2: may choosing

22.3: such text

p. 749: made to its made was

23.1: method to be applicable

Eq. (23.6), (23.14) and may be others: confusing notation.

(23.4) defines graph reconstruction and weight regularization losses. 23 4

In all other places, $\mathcal L_\text{G, RECON}$ is called graph regularization loss. 23 6

23.3.5: maximizes their likelihood

implications are studied are studied

In “is given by … can be approximated”, “is given by” is superfluous

23.5.1.2: autoencoders rely relies

23.5.1.4: discriminator descriminator

Eq. (23.35): Shouln't $\mathbf{\hat W}_{ij}$ be an outer product $\mathbf Z_i \mathbf Z_J^{\mathrm T}$? 23 35

p. 767: “… sampling process” sentence doesn't end with a period. no dot

23.6.1.2: no comma after social networks

Jul 06 '23 15:07 nihil-admirari

Fig. 2.17 says fig which can be true only if $\mathrm{mode}\, x = \max(\frac {a - 1} b, 0)$, but (2.142) forgets the max 2 142

The same applies to beta distribution: Fig. 2.17 says

If $a < 1$, we get a “spike” on the left, and if $b < 1$, we get a “spike” on the right. if $a = b = 1$, the distribution is uniform. If $a > 1$ and $b > 1$, the distribution is unimodal.

but the mode in (2.139) $\frac{a - 1}{a + b - 2}$ applies only to $a > 1$, $b > 1$ case.

Jul 15 '23 22:07 nihil-admirari

Wow, thank you for the detailed feedback! I will incorporate it into the next update.

I agree the denominator terms for MDS seem inconsistent. I am not sure it is correct, but FWIW, ChatGPT agrees with me.

You seem to be right about the errors in my description of tSNE. Based on https://jmlr.org/papers/v9/vandermaaten08a.html, your mods are correct, except they do include the k != i term in the denominator.

Jun 13 '24 04:06 murphyk