Nicolas Patry

Results 978 comments of Nicolas Patry

More tests with actual failures ``` pytest -sv tests/ -k tokenizer ``` ``` FAILED tests/models/bartpho/test_tokenization_bartpho.py::BartphoTokenizerTest::test_clean_up_tokenization_spaces - assert "[CLS]this sh...' ll go.[SEP]" == "[CLS] thi... FAILED tests/models/blenderbot_small/test_tokenization_blenderbot_small.py::BlenderbotSmallTokenizerTest::test_clean_up_tokenization_spaces - assert "[CLS]this sh...'...

There are no C bindings currently available. However you could probably setup some yourself relatively easily depending on the surface you need: https://developers.redhat.com/articles/2022/09/05/how-create-c-binding-rust-library#logging_in_the_c_binding https://docs.rust-embedded.org/book/interoperability/rust-with-c.html Depending on your project you could...

Which version are you using. This was fixed already on main and `0.14.1` https://github.com/huggingface/tokenizers/blob/main/tokenizers/src/models/bpe/trainer.rs#L541-L546

@DavidAdamczyk Use a more recent tokenizers version, or an older Rust compiler version.

Any news from this pull request ?

Hi @Matthieu-Tinycoaching This is linked to: huggingface/api-inference-community#26 Community images do not implement: - private models - GPU inference - Acceleration So what you are seeing is quite normal and is...

Multiple points: > However, pipeline expects the audio samples in the format as far as I remember we can also accept `array` for that reason. (`raw` came before `datasets` had...

> Thanks for the super in-depth explanation, @Narsil! Incredibly helpful and much appreciated hugs Well you initial issue was also pretty comprehensive, so thanks for creating it. > Maybe I'm...

> What is the big drawback of this? This is already done, it's a doc issue. And specifically for sanchit, datasets are using `{"audio" : {"sampling_rate": .., "audio": ..}}` instead...