probability
                                
                                 probability copied to clipboard
                                
                                    probability copied to clipboard
                            
                            
                            
                        Categorical features in logistic regression
Hi,
I'm following the logistic regression example (https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/logistic_regression.py) and trying to implement it with some real life data that has a mixture of numerical and categorical variables. Each of my categorical variables has a large number of elements though, and I'm wondering if one hot encoding is the best approach. Is there a better way of treating my categorical variables in TF/TF-p? Thanks.
Have you tried embedding layers, e.g., https://www.tensorflow.org/guide/embedding ? This compresses one-hot encodings into a low-dimensional real-valued space (e.g., vocabulary_size=1024 to embedding_size=64). This hasn't typically been done for regression analyses, but it's a natural idea once you're familiar with the technique. It would be nice to see a tutorial or example with it.
Thanks Dustin, I'll try that out.
Gonna reopen as it would be a great tutorial/example to have in TFP!
Hi, @dustinvtran I would like to work on this to add an example which uses categorical features. Thanks
@dustinvtran @allenchng I would like to contribute by writing a tutorial.