onnxruntime_flutter icon indicating copy to clipboard operation
onnxruntime_flutter copied to clipboard

For beginner, this library is practically useless without any viable tokenizer available in dart ecosystem

Open lamualfa opened this issue 1 year ago • 5 comments

In order to do an inference using a model, you need to preprocess the input to tensor using tokenizer. Currently, there isn't viable tokenizer available in dart ecosystem which make this library practically useless for most beginner.

Unless you want to write your own tokenizer, don't waste your time searching the example to use this library. Alternatively, you can go back to python or javascript who has well known tokenizer like the one maintained by HuggingFace:

  • Python https://huggingface.co/docs/transformers/en/index
  • Javascript https://huggingface.co/docs/transformers.js/en/index

lamualfa avatar Feb 28 '25 17:02 lamualfa

Perhaps we can entrust AI(ChatGPT or DeepSeek) to help us complete the above task

gtbluesky avatar Apr 03 '25 01:04 gtbluesky

Would highly recommend using the HuggingFace tokenizers Rust lib via the Flutter/Rust Bridge. We use this internally and it works great

brian-at-pieces avatar Apr 08 '25 17:04 brian-at-pieces

Would highly recommend using the HuggingFace tokenizers Rust lib via the Flutter/Rust Bridge. We use this internally and it works great

Hey, I tried this for ages now. I am not sure what I am doing wrong on the rust side. The crate is not scanned correctly automatically on the rust side hence the bindings result in wrong code generation. Any way you could help out / show me your public API with the tokenizers? :)

I would appreciate it A LOT.

alan-insam avatar Apr 25 '25 15:04 alan-insam

Would highly recommend using the HuggingFace tokenizers Rust lib via the Flutter/Rust Bridge. We use this internally and it works great

Hey, I tried this for ages now. I am not sure what I am doing wrong on the rust side. The crate is not scanned correctly automatically on the rust side hence the bindings result in wrong code generation. Any way you could help out / show me your public API with the tokenizers? :)

I would appreciate it A LOT.

Sorry it's a private project so I can't share the code. Are you trying to generate bindings for the entire tokenizers lib? If so, I would recommend just writing a simple wrapper Rust file that only includes the required functionality from tokenizers. Then just generate bindings for that one file

brian-at-pieces avatar Apr 28 '25 13:04 brian-at-pieces

Would highly recommend using the HuggingFace tokenizers Rust lib via the Flutter/Rust Bridge. We use this internally and it works great

Hey, I tried this for ages now. I am not sure what I am doing wrong on the rust side. The crate is not scanned correctly automatically on the rust side hence the bindings result in wrong code generation. Any way you could help out / show me your public API with the tokenizers? :) I would appreciate it A LOT.

Sorry it's a private project so I can't share the code. Are you trying to generate bindings for the entire tokenizers lib? If so, I would recommend just writing a simple wrapper Rust file that only includes the required functionality from tokenizers. Then just generate bindings for that one file

Hey Brian, thanks for getting back! I successfully managed to implement it yesterday. Thank you anyways, appreciate you :)

alan-insam avatar Apr 28 '25 13:04 alan-insam