Jan Motl comments

Results 82 comments of


                                            Jan Motl

Circular categories encoding

I was testing this transformation in the past with different models. And it never lead to an improvement. From that time, whenever I have cyclical features and feel the need...

WIP: Multi_hot encoder for ambiguous inputs

Nice tests. Just nitpicking: 1. In `test_multi_hot_fit` shouldn't: ```python self.assertEqual(enc.transform(X_t).shape[1], enc.transform(X_t[X_t['extra'] != 'A']).shape[1], 'We have to get the same count of columns') ``` become ```python self.assertEqual(enc.transform(X_t).shape[1], enc.transform(X_t[X_t['extra'] != 'D']).shape[1], #...

WIP: Multi_hot encoder for ambiguous inputs

Good work. > The transformation test was based on test_one_hot.py. That's actually a mistake of mine in test_one_hot.py. I will fix it. > Suffixes start with 1, not 0 in...

WIP: Multi_hot encoder for ambiguous inputs

@fullflu Please, check conformance of MultiHotEncoder to the changes in the master. All these changes were about `handle_missing` and `handle_unknown` arguments, which should be supported by all encoders. Note that...

WIP: Multi_hot encoder for ambiguous inputs

Good. > I will check this if necessary. How serious is this warning? I just attempt to keep the test results free of errors and warnings - once I allow...

WIP: Multi_hot encoder for ambiguous inputs

Awesome. Move the example. And I will merge it. Note: Just write somewhere that `|` assumes uniform distribution of the feature values. For example, when the data contain `1|2`, the...

WIP: Multi_hot encoder for ambiguous inputs

Nice. I like the use of `assert_frame_equal()` (I didn't know that it existed). And that you wrote the default settings in the documentation. I propose to rename the optional argument...

WIP: Multi_hot encoder for ambiguous inputs

> implement and_delimiter How is it going to work? Is it similar to `TfidfVectorizer` or `CountVectorizer`? A potentially useful dataset for the functionality illustration: [data](sorry.vse.cz/~berka/challenge/pkdd1999/data_berka.zip), [description](https://sorry.vse.cz/~berka/challenge/PAST/index.html). In my opinion, it...

WIP: Multi_hot encoder for ambiguous inputs

> dictionary used as prior (hyperprior is [1,1,1,...],... Nice touch with the hyperprior.

WIP: Multi_hot encoder for ambiguous inputs

Hi, @fullflu. Is there something I can help you with?