Jan Motl

Results 82 comments of Jan Motl

Yes, that's a bug. The inverse expects the mapping to be a Series but is a map. Hence, the workaround is to use something like: ```python def test_inverse_with_mapping(self): df =...

Can you write a PR? Just an idea: it could also work on domains like `github.com` or emails like `[email protected]`. You would just specify a list of delimiters (by default...

@jkleint I think I need examples. Are we talking about branching based on the values? E.g.: Postal addresses. Some countries (like the USA) divide to countries. But other countries (like...

That sounds reasonable. Just note that when we get '[email protected]' during the scoring time, we will have to look up keys for values, because 'tim' was not observed during the...

Another option: If TargetEncoder observes a column of dtype==list, it will treat the column as hierarchical with the first item being the top most level. And it would be up...

Ok. @jkleint has the longest comment. He implements the hierarchical processing for TargetEncoder. @JoshuaC3 implements the CountEncoder. @jkleint Be careful about deep hierarchies. E.g.: If each level is represented by...

The benchmarks are discussed in https://github.com/scikit-learn-contrib/categorical-encoding/issues/46. The results for classification are in `examples/benchmarking_large/output`. We don't currently have a benchmark for regression - if you would be willing to write one,...

The library does not currently support that. But PRs are always welcomed. A simple workaround is to sort all items in the cart alphabetically, so `[A,B]` and `[B,A]` will always...

A PR would be greatly appreciated.

Hi. I am curious to see what do you propose.