sentencepiece icon indicating copy to clipboard operation
sentencepiece copied to clipboard

Unsupervised text tokenizer for Neural Network-based text generation.

Results 102 sentencepiece issues
Sort by recently updated
recently updated
newest added

I need to use sentencepiece for tokenization, and I also need OpenVINO for NLP task inference. I am using vcpkg to manage both sentencepiece and OpenVINO. The protobuf for OpenVINO...

execution environment
protobuf
Details requested

When building with C++ '20. Error due to default of "-1" on L221-222 ``` constexpr unicode_script::ScriptType kAnyType = static_cast(-1); ```

enhancement

I have some id values ​​and I want to train them with bpe.The following is an example of the id value. ``` 26865, 5412, 26865, 26865, 26865, 26865, 5412, 5412,...

I think there is a bug in calculation of max_score in unigram_model.cc: https://github.com/google/sentencepiece/blob/6225e08edb2577757163b3f5dbba4c0b670ef445/src/unigram_model.cc#L657-L664 As FLT_MIN is a very small positive number (on my system it's 1.17549435e-38) and token scores are...

bug
Will fix in next release

Hi thanks for your great work on this. I noticed a subtle issue when playing with synthetic examples. The bpe algorithm works as expected but the unigram algorithm does not...

bug

spm.SentencePieceTrainer.train('--input=dict.ja.txt --model_prefix=m --vocab_size=27034') this line is showing the error

I recently encountered some compatibility issue when using `sentencepiece v0.2.0` together with latest `transformers` and `tensorflow` packages. When I ran some Python script that imports `AutoProcessor` class from `transformers`, the...

Bumps the github-actions group with 3 updates in the / directory: [actions/upload-artifact](https://github.com/actions/upload-artifact), [actions/checkout](https://github.com/actions/checkout) and [actions/setup-python](https://github.com/actions/setup-python). Updates `actions/upload-artifact` from 3 to 4 Release notes Sourced from actions/upload-artifact's releases. v4.0.0 What's Changed...

dependencies
github_actions

Bumps the build-time-deps group with 3 updates in the /.github/workflows/requirements directory: [cibuildwheel](https://github.com/pypa/cibuildwheel), [pytest](https://github.com/pytest-dev/pytest) and [setuptools](https://github.com/pypa/setuptools). Updates `cibuildwheel` from 2.19.2 to 2.21.1 Release notes Sourced from cibuildwheel's releases. Version 2.21.1 πŸ›...

dependencies
python

Using sentencepiece 0.1.99 in python 3.11.10, an out of range may cause crashes depending on which other valid inputs are part of the batch: ``` >>> tkn.load(str(Path("gemma2-9b") / "tokenizer.model")) True...

bug