ML icon indicating copy to clipboard operation
ML copied to clipboard

New transformer

Open Boorinio opened this issue 3 years ago • 5 comments
trafficstars

Hello, This is my first pr and I want to start off by saying I am really happy I discovered this project!

This pr includes a small fix for the text normalize class which according to the docs should be categorical. Also it includes a new transformer that shuffles a given string's words. I used the transformer in a personal project and it is useful for cases where the order of the words is not important (Product titles that need to be classified into categories).

Have a nice day, I hope this helps!

Boorinio avatar Jul 09 '22 10:07 Boorinio

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

github-actions[bot] avatar Jul 09 '22 10:07 github-actions[bot]

I have read the CLA Document and I hereby sign the CLA

Boorinio avatar Jul 09 '22 11:07 Boorinio

Hey nice PR @Boorinio! I'm struggling to think of the use cases for this Transformer though. Can you help me?

Hey, thanks for the reply! It's not a widely used technique as in most nlp problems the order of the words is actually really important. But as I mentioned in my initial comment there are problems where the order of the words should be disregarded as for example in product titles that we want to classify to specific categories (Red cotton blanket -> is matched to blankets but also Blanket red cotton should be matched to the same category). I opened this pr because I used this filter for a personal project, but I do understand that it might be too specific. If that's the case you can close this pr :)

Boorinio avatar Jul 11 '22 18:07 Boorinio

This transformer will make sense when we have RNNs. For the bag-of-words classifiers available, it will not make much sense.

Care to @Boorinio what was the classifier you were using this transformer with?

DrDub avatar Sep 05 '22 02:09 DrDub

I'm struggling to think of use cases for this. Maybe they will become more apparent in the future but as for now I think we should put this in the Extras package. @Boorinio would you mind submitting a PR to Extras repo?

https://github.com/RubixML/Extras

andrewdalpino avatar Sep 13 '22 00:09 andrewdalpino

Hey thanks for the responses and sorry for the big delay, closing this one!

Boorinio avatar Nov 06 '22 19:11 Boorinio