tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

feat: add cli for tokenizer and training

Open b00f opened this issue 4 months ago • 3 comments

This PR adds CLI with two subcommands:

  1. Tokenize: To tokenize a text using a given model
  2. Train: To train a new tokenize model

b00f avatar Aug 06 '25 10:08 b00f

If it's that small, and adds new dependencies, I feel like it should be it's own crate. tokenizers is a library, it shouldn't be a CLI as well.

Narsil avatar Sep 04 '25 14:09 Narsil

If it's that small, and adds new dependencies, I feel like it should be it's own crate. tokenizers is a library, it shouldn't be a CLI as well.

This is a valid point... I am not sure if we can define bin dependencies separately.

b00f avatar Sep 06 '25 11:09 b00f

We can do a new crate just fort the CLI, it does make sense instead. If this is highly requested happy to do!

ArthurZucker avatar Sep 08 '25 15:09 ArthurZucker