probability icon indicating copy to clipboard operation
probability copied to clipboard

Categorical features in logistic regression

Open allenchng opened this issue 7 years ago • 5 comments

Hi,

I'm following the logistic regression example (https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/logistic_regression.py) and trying to implement it with some real life data that has a mixture of numerical and categorical variables. Each of my categorical variables has a large number of elements though, and I'm wondering if one hot encoding is the best approach. Is there a better way of treating my categorical variables in TF/TF-p? Thanks.

allenchng avatar Jul 05 '18 14:07 allenchng

Have you tried embedding layers, e.g., https://www.tensorflow.org/guide/embedding ? This compresses one-hot encodings into a low-dimensional real-valued space (e.g., vocabulary_size=1024 to embedding_size=64). This hasn't typically been done for regression analyses, but it's a natural idea once you're familiar with the technique. It would be nice to see a tutorial or example with it.

dustinvtran avatar Jul 14 '18 16:07 dustinvtran

Thanks Dustin, I'll try that out.

allenchng avatar Jul 16 '18 14:07 allenchng

Gonna reopen as it would be a great tutorial/example to have in TFP!

dustinvtran avatar Jul 17 '18 02:07 dustinvtran

Hi, @dustinvtran I would like to work on this to add an example which uses categorical features. Thanks

badlogicmanpreet avatar Feb 01 '19 17:02 badlogicmanpreet

@dustinvtran @allenchng I would like to contribute by writing a tutorial.

Tasfia-Ara avatar Feb 05 '22 08:02 Tasfia-Ara