Jan Motl comments

Results 82 comments of


                                            Jan Motl

Broken inverse_transform for OrdinalEncoder when custom mapping in use!

Yes, that's a bug. The inverse expects the mapping to be a Series but is a map. Hence, the workaround is to use something like: ```python def test_inverse_with_mapping(self): df =...

Implement Target Encoding with Hierarchical Structure Smoothing

Can you write a PR? Just an idea: it could also work on domains like `github.com` or emails like `[email protected]`. You would just specify a list of delimiters (by default...

Implement Target Encoding with Hierarchical Structure Smoothing

@jkleint I think I need examples. Are we talking about branching based on the values? E.g.: Postal addresses. Some countries (like the USA) divide to countries. But other countries (like...

Implement Target Encoding with Hierarchical Structure Smoothing

That sounds reasonable. Just note that when we get '[email protected]' during the scoring time, we will have to look up keys for values, because 'tim' was not observed during the...

Implement Target Encoding with Hierarchical Structure Smoothing

Another option: If TargetEncoder observes a column of dtype==list, it will treat the column as hierarchical with the first item being the top most level. And it would be up...

Implement Target Encoding with Hierarchical Structure Smoothing

Ok. @jkleint has the longest comment. He implements the hierarchical processing for TargetEncoder. @JoshuaC3 implements the CountEncoder. @jkleint Be careful about deep hierarchies. E.g.: If each level is represented by...

[Feature request] Benchmark of encoding strategies for different tasks

The benchmarks are discussed in https://github.com/scikit-learn-contrib/categorical-encoding/issues/46. The results for classification are in `examples/benchmarking_large/output`. We don't currently have a benchmark for regression - if you would be willing to write one,...

Jan Motl

Broken inverse_transform for OrdinalEncoder when custom mapping in use!

Implement Target Encoding with Hierarchical Structure Smoothing

Implement Target Encoding with Hierarchical Structure Smoothing

Implement Target Encoding with Hierarchical Structure Smoothing

Implement Target Encoding with Hierarchical Structure Smoothing

Implement Target Encoding with Hierarchical Structure Smoothing

[Feature request] Benchmark of encoding strategies for different tasks

Target encoding a feature where multiple values are allowed?

Count Encoding

Circular categories encoding