tokenizers
tokenizers copied to clipboard
feat: add cli for tokenizer and training
This PR adds CLI with two subcommands:
Tokenize: To tokenize a text using a given modelTrain: To train a new tokenize model
If it's that small, and adds new dependencies, I feel like it should be it's own crate. tokenizers is a library, it shouldn't be a CLI as well.
If it's that small, and adds new dependencies, I feel like it should be it's own crate. tokenizers is a library, it shouldn't be a CLI as well.
This is a valid point... I am not sure if we can define bin dependencies separately.
We can do a new crate just fort the CLI, it does make sense instead. If this is highly requested happy to do!