eli5
eli5 copied to clipboard
FeatureUnhasher does not support an input_type of dict
The current implementation only supports input types of String. It will be nice to have a FeatureUnhasher which accepts Featurehashers of input type dict
As I recall, we added FeatureUnhasher mainly to support HashingVectorizer, so we started with 'string'.
On a first sight, adding input_type='dict' support it a matter of removing the exception, changing the way _term_counts
attribute is computed, and adding some tests.
I don't have immediate plans to implement this feature, but it looks like a good problem for new contributors, so pull requests are welcome!
Is it okay if I work on this issue, I mean if nobody else is working on this?
@kmike as i was going through tests there are no tests for the function featureunhasher..?
@coderop2 right; adding them can be a good first step. It is tested only indirectly, by testing InvertableHashingVectorizer which uses FeatureUnhasher internally.
So first we can include the functionality for input_type dict and then add tests for both together
On Wed, Mar 6, 2019, 1:10 AM Mikhail Korobov [email protected] wrote:
@coderop2 https://github.com/coderop2 right; adding them can be a good first step. It is tested only indirectly, by testing InvertableHashingVectorizer which uses FeatureUnhasher internally.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TeamHG-Memex/eli5/issues/236#issuecomment-469829066, or mute the thread https://github.com/notifications/unsubscribe-auth/AbHGbLeoo6utFnPNkeR5rXiKbIIwruV0ks5vTsgSgaJpZM4OxRnf .
@coderop2 yes, this works. Alternatively, one can start by adding tests for existing FeatureHasher, to get their feet wet; this would be a smaller change which can be merged separately.