Add split_text_by_spaces string util, normalize LLMTextFrame output
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
Take three on a solution that handles large text outputs from Google Gemini.
This goes one step further and normalizes all LLMTextFrame output from services into individual, contiguous chunks of characters, split based on spaces. The idea is that every LLM service should output LLMTextFrames uniformly, making it easier for downstream processors to handle different services in the same way.
For the time being, this does not update the realtime LLM services. Though, I think we want those to also be updated to work in the same way. Cc @kompfner for when he's back to take a look and see if this makes sense for realtime LLMs.
Codecov Report
:x: Patch coverage is 85.84071% with 16 lines in your changes missing coverage. Please review.
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/pipecat/extensions/ivr/ivr_navigator.py | 81.75% <100.00%> (-0.96%) |
:arrow_down: |
| src/pipecat/utils/text/simple_text_aggregator.py | 100.00% <100.00%> (+4.34%) |
:arrow_up: |
| src/pipecat/utils/text/base_text_aggregator.py | 78.12% <80.00%> (-1.19%) |
:arrow_down: |
| src/pipecat/utils/text/pattern_pair_aggregator.py | 95.41% <97.50%> (+0.26%) |
:arrow_up: |
| ...pecat/processors/aggregators/llm_text_processor.py | 0.00% <0.00%> (ø) |
|
| src/pipecat/utils/text/skip_tags_aggregator.py | 85.18% <80.95%> (-10.97%) |
:arrow_down: |
| src/pipecat/services/tts_service.py | 42.48% <22.22%> (-0.24%) |
:arrow_down: |
... and 1 file with indirect coverage changes
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
LGTM!