openai-cookbook icon indicating copy to clipboard operation
openai-cookbook copied to clipboard

Call tiktoken.get_encoding("cl100k_base") get error: "binascii.Error: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4"

Open littleGnAl opened this issue 2 years ago • 0 comments
trafficstars

I copy this file https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py to my project and try to use it, but I got this error, I try to hard code the encoding name to "cl100k_base", and I also try to re-install the tiktoken package, but it seems to nothing to help. Is there any suggestion for how to fix it?

Traceback (most recent call last):
  File "/My/Project/main.py", line 157, in <module>
    asyncio.run(
  File "/usr/local/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/My/Project/src/api_request_parallel_processor.py", line 173, in process_api_requests_from_file
    token_consumption=num_tokens_consumed_from_request(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/My/Project/src/api_request_parallel_processor.py", line 360, in num_tokens_consumed_from_request
    encoding = tiktoken.get_encoding("cl100k_base")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tiktoken/registry.py", line 63, in get_encoding
    enc = Encoding(**constructor())
                     ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tiktoken_ext/openai_public.py", line 64, in cl100k_base
    mergeable_ranks = load_tiktoken_bpe(
                      ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tiktoken/load.py", line 115, in load_tiktoken_bpe
    return {
           ^
  File "/usr/local/lib/python3.11/site-packages/tiktoken/load.py", line 116, in <dictcomp>
    base64.b64decode(token): int(rank)
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/base64.py", line 88, in b64decode
    return binascii.a2b_base64(s, strict_mode=validate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
binascii.Error: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4

littleGnAl avatar Apr 17 '23 14:04 littleGnAl