sentencepiece
sentencepiece copied to clipboard
Unsupervised text tokenizer for Neural Network-based text generation.
fixes future deprecation ``` CMake Deprecation Warning at lib/sentencepiece/CMakeLists.txt:15 (cmake_minimum_required): Compatibility with CMake < 3.5 will be removed from a future version of CMake. Update the VERSION argument value or...
After building for iOS 17, I get this error when loading a model: ``` sentencepiece::SentencePieceProcessor srcSPM, destSPM; auto loadSrcStatus = srcSPM.Load(modelPathStr + "/source.spm"); auto loadDestStatus = destSPM.Load(modelPathStr + "/target.spm"); ```...
Hi, Is there any way we can define a set of sub-words to be not split but still considered for token generation. This is especially required for phonetically rich languages...
Bumps the github-actions group with 6 updates: | Package | From | To | | --- | --- | --- | | [actions/upload-artifact](https://github.com/actions/upload-artifact) | `3.1.3` | `4.3.3` | | [actions/checkout](https://github.com/actions/checkout)...
`C:\Users\Ali>pip install esptool Collecting esptool Using cached esptool-4.7.0.tar.gz (285 kB) Preparing metadata (setup.py) ... done Collecting bitstring>=3.1.6 (from esptool) Using cached bitstring-4.1.4-py3-none-any.whl.metadata (5.8 kB) Collecting cryptography>=2.1.4 (from esptool) Using cached...
Current commit: 4d6a1f41069c4636c51a5590f7578a0dbed83450 Running the following in a clean `ubuntu:latest` docker container ```shell apt update apt install -y cmake build-essential pkg-config libgoogle-perftools-dev git gdb cd /tmp git clone https://github.com/google/sentencepiece.git cd...
Bumps the github-actions group with 6 updates in the / directory: | Package | From | To | | --- | --- | --- | | [actions/upload-artifact](https://github.com/actions/upload-artifact) | `3.1.3` |...
Bumps the build-time-deps group in /.github/workflows/requirements with 4 updates: [cibuildwheel](https://github.com/pypa/cibuildwheel), [twine](https://github.com/pypa/twine), [pip](https://github.com/pypa/pip) and [setuptools](https://github.com/pypa/setuptools). Updates `cibuildwheel` from 2.18.1 to 2.19.1 Release notes Sourced from cibuildwheel's releases. Version 2.19.1 🐛 Don't...
Even though it seems that this issue is resolved with #629, I still encounter zero width joiner being replaced with whitespace for Sinhala Language. Any solutions for that?
Sentencepiece has no typing information, which makes it hard to work with. For example: