agents icon indicating copy to clipboard operation
agents copied to clipboard

examples simple-rag bug: split_paragraphs isn't working correctly

Open Kael-DWT opened this issue 4 months ago • 4 comments

livekit.agents.tokenize._basic_paragraph.split_paragraphs

def split_paragraphs(text: str) -> list[tuple[str, int, int]]:
    """
    Split the text into paragraphs.
    Returns a list of paragraphs with their start and end indices of the original text.
    """
    matches = re.finditer(r"\n{2,}", text)
    paragraphs = []

    for match in matches:
        paragraph = match.group(0)
        start_pos = match.start()
        end_pos = match.end()
        paragraphs.append((paragraph.strip(), start_pos, end_pos))

    return paragraphs

Is this regex written incorrectly? It should be like this.

matches = re.finditer(r".+\n{2,}", text)

Kael-DWT avatar Oct 09 '24 03:10 Kael-DWT