Typos (book 1, version 2023-06-23)
Fig. 19.1: a image images
19.2.3: an a variety
19.3.5: initialize set
Fig. 19.11:
lossses
model predictions class 0
19.3.6: semi-supervised without a noun
20.2.6 and other places:
Nitpicking, but the book uses American spelling throughout, except for analysers. (And except for Fig.20.11.)
20.2.6: “let latent indicator specifying…” better be specify
In “In the special case that”, “that” is superfluous
Eq. (20.89), (20.90): $\mathbf B$ in equations, $\mathbf V$ in description
Eq. (20.92): Should $\mathbf B$ really be absent?
Direct application of (3.38) results in
$$ \begin{bmatrix} \mathbf W_x \mathbf W_x^{\mathrm T} + \mathbf B_x \mathbf B_x^{\mathrm T} & \mathbf W_x \mathbf W_y^{\mathrm T} \ \mathbf W_y \mathbf W_x^{\mathrm T} & \mathbf W_y \mathbf W_y^{\mathrm T} + \mathbf B_y \mathbf B_y^{\mathrm T} \end{bmatrix} $$
Am I doing something wrong?
Eq. (20.111–113) Confusing notation
Is $d_{i, j}$ the same as $d_{ij}$? Should (20.112) really have $\hat d_{ij}^2$ in the denominator, while (20.111) and (20.113) are having plain $d_{ij}^2$?
Eq. (20.133)
derivable only if $k \ne i$ is removed from both (20.130) and (20.131)
Eq. (20.133), (20.136):
Is it really $\mathbf z_j - \mathbf z_i$ rather than the other way around? (20.138) is different
Eq. (20.138): a factor of 4 instead of 2?
20.5.2.2: central (target) word to be predicted
Skipgram models predict context rather than the target word.
21.1: data for some data
21.2.3: and g
21.4.1.3: In “higher than for the other”, “for” is superfluous
22.2.2: may choosing
22.3: such text
p. 749: made to its
23.1: method to be applicable
Eq. (23.6), (23.14) and may be others: confusing notation.
(23.4) defines graph reconstruction and weight regularization losses.
In all other places, $\mathcal L_\text{G, RECON}$ is called graph regularization loss.
23.3.5: maximizes their likelihood
implications are studied
In “is given by … can be approximated”, “is given by” is superfluous
23.5.1.2: autoencoders rely
23.5.1.4: discriminator
Eq. (23.35): Shouln't $\mathbf{\hat W}_{ij}$ be an outer product $\mathbf Z_i \mathbf Z_J^{\mathrm T}$?
p. 767: “… sampling process” sentence doesn't end with a period.
23.6.1.2: no comma after social networks
Fig. 2.17 says
which can be true only if $\mathrm{mode}\, x = \max(\frac {a - 1} b, 0)$, but (2.142) forgets the max
The same applies to beta distribution: Fig. 2.17 says
If $a < 1$, we get a “spike” on the left, and if $b < 1$, we get a “spike” on the right. if $a = b = 1$, the distribution is uniform. If $a > 1$ and $b > 1$, the distribution is unimodal.
but the mode in (2.139) $\frac{a - 1}{a + b - 2}$ applies only to $a > 1$, $b > 1$ case.
Wow, thank you for the detailed feedback! I will incorporate it into the next update.
I agree the denominator terms for MDS seem inconsistent. I am not sure it is correct, but FWIW, ChatGPT agrees with me.
You seem to be right about the errors in my description of tSNE. Based on https://jmlr.org/papers/v9/vandermaaten08a.html, your mods are correct, except they do include the k != i term in the denominator.