machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

[Tokenizers] Port CodeGenTokenizer & byte-level BPE algorithm

Open ericstj opened this issue 1 year ago • 0 comments

Port Codegen Tokenizer to enable Phi-2 models

Reference: https://huggingface.co/docs/transformers/main/en/model_doc/codegen https://github.com/huggingface/transformers/blob/0549000c5bf6c7249f411917f2a6f0b6d0f06da1/src/transformers/models/codegen/tokenization_codegen.py#L98

Paper: https://arxiv.org/abs/2203.13474 https://arxiv.org/pdf/2203.13474.pdf

ericstj avatar Feb 08 '24 18:02 ericstj