openai-cookbook
openai-cookbook copied to clipboard
Fixed bug for subsequent chunks
Oh yes, clearly a bug. I think the rewrite has to be larger (the if statement shouldn't add ntokens again, and there's no need for +2 for the first chunk of the batch):
" for chunk, ntoken in zip(chunks, ntokens):\n",
" cur_tokens += ntoken + 2 # +2 for the newlines between chunks\n",
"\n",
" # if adding this chunk would exceed the max length, finalize the current batch and start a new one\n",
" if cur_tokens > max_len:\n",
" batches.append(cur_batch)\n",
" cur_batch = chunk\n",
" cur_tokens = ntoken\n",
" else:\n",
" cur_batch += \"\\n\\n\" + chunk\n",
" batches.append(cur_batch)\n",
Fixed in https://github.com/openai/openai-cookbook/pull/579. Closing PR. Thanks for flagging!
(Made a couple of other changes, including +2 tokens to +1, as \n\n is a single-token in the GPT-3 encoding.)