ML
ML copied to clipboard
New transformer
Hello, This is my first pr and I want to start off by saying I am really happy I discovered this project!
This pr includes a small fix for the text normalize class which according to the docs should be categorical. Also it includes a new transformer that shuffles a given string's words. I used the transformer in a personal project and it is useful for cases where the order of the words is not important (Product titles that need to be classified into categories).
Have a nice day, I hope this helps!
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅
I have read the CLA Document and I hereby sign the CLA
Hey nice PR @Boorinio! I'm struggling to think of the use cases for this Transformer though. Can you help me?
Hey, thanks for the reply! It's not a widely used technique as in most nlp problems the order of the words is actually really important. But as I mentioned in my initial comment there are problems where the order of the words should be disregarded as for example in product titles that we want to classify to specific categories (Red cotton blanket -> is matched to blankets but also Blanket red cotton should be matched to the same category). I opened this pr because I used this filter for a personal project, but I do understand that it might be too specific. If that's the case you can close this pr :)
This transformer will make sense when we have RNNs. For the bag-of-words classifiers available, it will not make much sense.
Care to @Boorinio what was the classifier you were using this transformer with?
I'm struggling to think of use cases for this. Maybe they will become more apparent in the future but as for now I think we should put this in the Extras package. @Boorinio would you mind submitting a PR to Extras repo?
https://github.com/RubixML/Extras
Hey thanks for the responses and sorry for the big delay, closing this one!