Patsakula Nikita

Results 14 issues of Patsakula Nikita

- [x] Update dependencies. - [x] Bump version - [x] Run `sh update_lrgrammar.sh`. Blocked by #660.

## Cargo check (a170459c0692a47d7dc1a0df44e9ec1fbb9efd50) - Trailing semicolon removed ## Cargo clippy (93ea60027606382ec4d9977b59e13842de1ea0e1) - Extra referencing removed. - Extra type annotations removed. - Extra lifetime annotations removed. - Extra `clone` removed....

Hello! I found little bug in `build.rs` and would like to help :D ## Problem NixOS distributive (and some more) separating `libtorch` shared symbols and headers. And it would be...

## Description Hello! New `libffi` library version doesn't contain `argInt` function. I've made small fix. It's work for me but I'm not sure about it's correctness in general. Link to...

- [x] Implement NLLB tokenizer: https://github.com/guillaume-be/rust-tokenizers/pull/76 (waiting for review). - [x] Extend language enum: ISO [tables](https://iso639-3.sil.org/) was used. - [ ] Add resources links (block: model is not converted). -...

Hello! Facebook No Language Left Behind description page: https://github.com/facebookresearch/fairseq/tree/nllb ## Huggingface It's already hosted in Huggingface. https://huggingface.co/facebook/nllb-200-1.3B https://huggingface.co/facebook/nllb-200-3.3B https://huggingface.co/facebook/nllb-200-distilled-600M ## Plan [ ] Merge NLLB support into `rust-tokenizers`: https://github.com/guillaume-be/rust-tokenizers/pull/76 [...

Master PR: #76 Blocked by: #79 ## Motivation 1. Not every FS path is a valid utf-8 string. 2. `AsRef` is strictly more acceptable with same level of correctness: you...

Master MR: https://github.com/guillaume-be/rust-tokenizers/pull/76 - [x] Add structural errors: - [x] Save source error type. - [x] Line location. - [x] Make reasonable error message. - [ ] Fix tests ignored...

Hello, this is my initial NLLB tokenizer support MR! Tokenizer config: https://huggingface.co/facebook/nllb-200-1.3B/blob/main/tokenizer_config.json Special tokens: https://huggingface.co/facebook/nllb-200-1.3B/blob/main/special_tokens_map.json Vocabulary + depth of unknown: https://huggingface.co/facebook/nllb-200-1.3B/blob/main/tokenizer.json ## Unsolved questions: 1. ~~`bos`/`cls`/`eos` is hardcoded constants, but...

To insert the increasing IDs can be useful to be able to get the result of the insertion. This is a very crude example, but I would like to have...