eli5 icon indicating copy to clipboard operation
eli5 copied to clipboard

FeatureUnhasher does not support an input_type of dict

Open manyu90 opened this issue 7 years ago • 6 comments

The current implementation only supports input types of String. It will be nice to have a FeatureUnhasher which accepts Featurehashers of input type dict

manyu90 avatar Aug 08 '17 20:08 manyu90

As I recall, we added FeatureUnhasher mainly to support HashingVectorizer, so we started with 'string'.

On a first sight, adding input_type='dict' support it a matter of removing the exception, changing the way _term_counts attribute is computed, and adding some tests.

I don't have immediate plans to implement this feature, but it looks like a good problem for new contributors, so pull requests are welcome!

kmike avatar Aug 09 '17 07:08 kmike

Is it okay if I work on this issue, I mean if nobody else is working on this?

mohdkashif93 avatar Jun 30 '18 21:06 mohdkashif93

@kmike as i was going through tests there are no tests for the function featureunhasher..?

coderop2 avatar Mar 05 '19 16:03 coderop2

@coderop2 right; adding them can be a good first step. It is tested only indirectly, by testing InvertableHashingVectorizer which uses FeatureUnhasher internally.

kmike avatar Mar 05 '19 19:03 kmike

So first we can include the functionality for input_type dict and then add tests for both together

On Wed, Mar 6, 2019, 1:10 AM Mikhail Korobov [email protected] wrote:

@coderop2 https://github.com/coderop2 right; adding them can be a good first step. It is tested only indirectly, by testing InvertableHashingVectorizer which uses FeatureUnhasher internally.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TeamHG-Memex/eli5/issues/236#issuecomment-469829066, or mute the thread https://github.com/notifications/unsubscribe-auth/AbHGbLeoo6utFnPNkeR5rXiKbIIwruV0ks5vTsgSgaJpZM4OxRnf .

coderop2 avatar Mar 05 '19 20:03 coderop2

@coderop2 yes, this works. Alternatively, one can start by adding tests for existing FeatureHasher, to get their feet wet; this would be a smaller change which can be merged separately.

kmike avatar Mar 05 '19 20:03 kmike