zeugma icon indicating copy to clipboard operation
zeugma copied to clipboard

Defining own tokenizer

Open NathanHoy96 opened this issue 3 years ago • 1 comments

It would be good if we were able to define our own tokenizer or be able to pass a list of tokens to transform. I'm using a workaround at the moment but in sklearn you're able to pass your own tokenizer and preprocessor.

NathanHoy96 avatar Jan 05 '22 13:01 NathanHoy96

Good idea, we could implement passing an optional Tokenizer during the EmbeddingTransformer instantiation. Feel free to raise a PR if you have time to give it a try.

nkthiebaut avatar Jul 21 '23 18:07 nkthiebaut