membership-inference icon indicating copy to clipboard operation
membership-inference copied to clipboard

how to convert input_var to a string matrix, if train_feat_file contains string instead of floats?

Open SMJT01 opened this issue 6 years ago • 2 comments

Hi csong, I am new in Tensor. I wanted to try your code, but my dataset contains string data instead of floating point values. How should i modify the code in my case? could you please help? Thanks

SMJT01 avatar Jan 21 '19 01:01 SMJT01

Hi SMJT01,

Your string features can be treated as Categorical Data. There are quite a few ways of encoding these string values into numerals that can then be interpreted by the ML model.

The following article provides quick descriptions about some of these various methods. https://towardsdatascience.com/smarter-ways-to-encode-categorical-data-for-machine-learning-part-1-of-3-6dca2f71b159

Two of these encoders are also implemented in the official sk-learn library: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html And other implementations can be found too. http://contrib.scikit-learn.org/categorical-encoding/

Imathatguy avatar Jan 22 '19 06:01 Imathatguy

Hi Imathatguy, thank you very much for your reply. Primarily I solved this issue by encoding them to a numerical dataset. But off course if I could use categorical attributes directly, that would save much of my time. as far as I explore, Thenao tensor does not have anything for string datatype. I'll try the scikit learn packages and i'll let your know whether they work well on the data or not. Thanks

SMJT01 avatar Jan 22 '19 21:01 SMJT01