sentencepiece-jni
sentencepiece-jni copied to clipboard
Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
SentencePiece Java Wrapper
Java wrapper for SentencePiece with JNI. This module wraps
sentencepiece::SentencePieceProcessor
class with the following modifications:
- Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectively.
- SentencePieceText proto is not supported.
SentencePiece Version
Build and Install SentencePiece
To build and install the Java wrapper from source, please try the following commands:
% mvn clean install
Using sentencepiece-jni as a dependency
Because the resulting JAR is platform-dependent, resolving this dependency is managed by the os-maven-plugin. Follow the instructions there to use this platform-dependent JAR.
Please note you need to have a C++ compiler and cmake installed.
Usage
See SentencePieceProcessorTest for more.