AIAS icon indicating copy to clipboard operation
AIAS copied to clipboard

请问GPT2TokenizerFast的实现有规划吗

Open zjcDM opened this issue 2 years ago • 4 comments

zjcDM avatar Mar 08 '23 08:03 zjcDM

已经实现

zjcDM avatar Mar 08 '23 08:03 zjcDM

用这个方法:

1. pom 配置

    <dependency>
        <groupId>ai.djl.huggingface</groupId>
        <artifactId>tokenizers</artifactId>
        <version>0.19.0</version>
    </dependency>

private static final HuggingFaceTokenizer tokenizer;

2. 例子代码

# 声明
static {
    try {
        tokenizer =
                HuggingFaceTokenizer.builder()
                        .optManager(manager)
                        .optPadding(true)
                        .optPadToMaxLength()
                        .optMaxLength(MAX_LENGTH)
                        .optTruncation(true)
                        .optTokenizerName("openai/clip-vit-large-patch14")
                        .build();
        // sentence-transformers/msmarco-distilbert-dot-v5
        // openai/clip-vit-large-patch14
        // https://huggingface.co/sentence-transformers/msmarco-distilbert-dot-v5
        // https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/tokenizer/tokenizer_config.json
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

# 使用
List<String> tokens = tokenizer.tokenize(prompt);

mymagicpower avatar Mar 08 '23 09:03 mymagicpower

https://github.com/deepjavalibrary/djl/blob/master/extensions/tokenizers/README.md

mymagicpower avatar Mar 08 '23 09:03 mymagicpower

https://github.com/deepjavalibrary/djl/blob/master/extensions/tokenizers/README.md

你好,这个好像无法自定义词表?

zjcDM avatar Apr 03 '23 09:04 zjcDM