openai icon indicating copy to clipboard operation
openai copied to clipboard

TokenizerGpt3.Encode incorrect

Open 3400442579 opened this issue 1 year ago • 1 comments

Describe the bug TokenizerGpt3.Encode incorrect

Your code piece

 text = """
                Many words map to one token, but some don't: indivisible.

                Unicode characters like emojis may be split into many tokens containing the underlying bytes: 🤚🏾

                Sequences of characters commonly found next to each other may be grouped together: 1234567890
                """;
            int n=TokenizerGpt3.Encode(text).Count;

Result = 260

Screenshots If applicable, add screenshots to help explain your problem. image

Desktop (please complete the following information):

  • OS: [Windows]
  • Language [c#]
  • Version [6.7.0]

Additional context Add any other context about the problem here.

3400442579 avatar Mar 03 '23 07:03 3400442579

That's interesting, I thought I solved this issue in 6.7.0 . Thanks for reporting, I will have a look

kayhantolga avatar Mar 03 '23 10:03 kayhantolga

solved in v6.7.2

kayhantolga avatar Mar 05 '23 20:03 kayhantolga