openai-cookbook icon indicating copy to clipboard operation
openai-cookbook copied to clipboard

[PROBLEM]

Open JackZL opened this issue 2 years ago • 0 comments
trafficstars

[optional format]

Identify the file to be fixed apps/web-crawl-q-and-a/web-qa.ipynb

Describe the problem In the function below, the last chunk is discarded. Is it the intention?

`def split_into_many(text, max_tokens = max_tokens):

# Split the text into sentences
sentences = text.split('. ')

# Get the number of tokens for each sentence
n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences]

chunks = []
tokens_so_far = 0
chunk = []

# Loop through the sentences and tokens joined together in a tuple
for sentence, token in zip(sentences, n_tokens):

    # If the number of tokens so far plus the number of tokens in the current sentence is greater 
    # than the max number of tokens, then add the chunk to the list of chunks and reset
    # the chunk and tokens so far
    if tokens_so_far + token > max_tokens:
        chunks.append(". ".join(chunk) + ".")
        chunk = []
        tokens_so_far = 0

    # If the number of tokens in the current sentence is greater than the max number of 
    # tokens, go to the next sentence
    if token > max_tokens:
        continue

    # Otherwise, add the sentence to the chunk and add the number of tokens to the total
    chunk.append(sentence)
    tokens_so_far += token + 1

return chunks`

Describe a solution None

Screenshots image

Additional context

JackZL avatar May 23 '23 08:05 JackZL