zeugma
zeugma copied to clipboard
Defining own tokenizer
It would be good if we were able to define our own tokenizer or be able to pass a list of tokens to transform. I'm using a workaround at the moment but in sklearn you're able to pass your own tokenizer and preprocessor.
Good idea, we could implement passing an optional Tokenizer during the EmbeddingTransformer instantiation. Feel free to raise a PR if you have time to give it a try.