handson-ml3 [QUESTION] What is the correct form of Equation 4-13 in CH. 4 Logistic Regression?

It's not clear why the theta in this formula is transposed. Screenshot from 2024-07-21 10-16-13

I believe it should've been the X that gets to be transposed. So this is the correct equation. Screenshot from 2024-07-21 10-18-39

Jul 21 '24 06:07 siavashr99

For two vectors $a$ and $b$, we have $a^Tb = b^T a$, so the two expressions in your post are identical.

Nov 23 '24 17:11 LittleCottage

Thanks for your question @siavashr99. As @LittleCottage said, the dot product of two vectors is commutative: a · b = b · a. By the way, in Machine Learning, the dot product of two vectors a and b is often denoted a^⊺b. Mathematically speaking, that's not quite right: a and b would have to be column vectors (i.e., matrices with a single column), and the result would be a 1 ⨉ 1 matrix with a single item equal to a · b. But using this notation has the advantage that we can use the same notation whether we're talking about vectors or matrices (there's a note near the start of Chapter 4 about this).

Now if you're talking about a matrix X rather than a vector x, then things are different, the order matters. Suppose X in an m ⨉ d matrix (there are m samples, and each of them is d-dimensional), and the weight matrix Θ is n ⨉ d (where n is the output dimensionality), then we usually compute: XΘ^⊺ since this will give us an output matrix of shape m ⨉ n: that's usually the shape we want, with one row per sample (just like X has one row per sample).

In many cases, to avoid the transpose step, the weight matrix is directly represented as a d ⨉ n matrix (instead of n ⨉ d) so there's no need to transpose it: we just compute XΘ, no transpose.

Hope this helps.

Oct 14 '25 21:10 ageron