website Some questions/comments on Categorical Predictors

Some questions/comments on Categorical Predictors

Open bmreiniger opened this issue 9 months ago • 0 comments

The example with each agent working with a single customer type introduced in 5.2:
1. I think the row-wise sum comment could use some clarification; it's the sum among agents with a given customer type, and the single customer type column?
2. Later, in 5.4.3, the example is reused, but I think the language is stronger: "agent was aliased with the customer type" to me means there's a one-to-one correspondence rather than the many-to-one relationship I think the original insinuated. And in a one-to-one relationship, the effect encodings will end up being identical, so the argument fails. Separately: can we add a ref-link?
Figure 5.1 typo "distirbution"
In 5.4, I would expect to see some mention of coarsening the categories according to domain knowledge (e.g. states into regions). Maybe also model-based coarsening that uses other predictors?
The Cerda & Varoquaux citation seems to deal more with encodings that take the string nature of the predictor into account, with a hint of natural language processing to it.
In 5.4.2, I'm not sure whether adding a -1 to the hashing values leads to "fewer collisions"; it depends on what exactly you mean by a collision, and I'm not familiar with the cryptography literature to say. But in a parametric model, it's still enforcing some arbitrary constraint.
The intro to 5.3.2 says "different" supervised tool, but it's the only supervised tool in the chapter.
In 5.5, I'd like a small note about integer-encoding the values being reasonable for certain models. (Again, "will be discussed more later", but a preview would be nice.)

Apr 30 '24 14:04 bmreiniger

website website copied to clipboard

Some questions/comments on Categorical Predictors

website
website copied to clipboard