outlines-core icon indicating copy to clipboard operation
outlines-core copied to clipboard

Vocabulary/ GPT2 : Bad interpretation of tokenId = 216

Open agourdel opened this issue 9 months ago • 0 comments

Describe the issue as clearly as possible:

The TokenId(216) of the GPT2 Alphabet which have the value "\u011c" has only the byte(28) in its Vec of the Vocabulary. the byte 28 is '\x1C' so, it's possible there is a bad behavior when the alphabet is loaded.

Steps/code to reproduce the bug:

//

Expected result:

TokenId(226) = vec![0xC4, 0x9C];

Error message:


Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

No response

agourdel avatar Mar 12 '25 15:03 agourdel