feat: add Vonage Audio Connector integration (serializer, transport, foundational example)
Summary
This PR introduces the Vonage Audio Connector integration including a custom serializer, the VonageAudioConnectorTransport + VonageAudioConnectorOutputTransport and a foundational example.
Changes
- Added foundational example:
examples/foundational/49-vonage-audio-connector-openai.py - Added
VonageFrameSerializerundersrc/pipecat/serializers/vonage.py - Added
VonageAudioConnectorTransportandVonageAudioConnectorOutputTransportundersrc/pipecat/transports/vonage/audio_connector.py - Added new package folder
src/pipecat/transports/vonage/with__init__.py - Updated
env.example - Updated
pyproject.tomlanduv.lock
Why This Is Needed
This integration enables Pipecat to work with the Vonage Voice API Audio Connector supporting real-time STT → LLM → TTS pipelines and will be used to expand the ecosystem of community-maintained integrations.
Testing
- Basic end-to-end pipeline validated (audio in → STT → LLM → TTS → audio out)
- Serializer and transport tested for encoding/decoding correctness
- Verified pacing behavior (sleep-per-chunk timing) matches Vonage Audio Connector requirements
- Confirmed WAV-header wrapping when enabled
Hi @jamsea
I’ve created the PR for the Vonage Audio Connector integration (serializer, transport, foundational example).
Please take a look whenever you get a chance — happy to make any changes needed. Thanks!
I’ve just pushed a follow-up commit to switch the foundational example from the dev OpenTok API URL to the production https://api.opentok.com.
Hi @markbackman and @filipi87 Can you please find sometime to review this PR.
Sorry for the delay. We're backlogged on PR reviews. I took a quick look at this and think it's a good plan to split it up. First, can you create a PR for only the VonageFrameSerializer? Along with this, it would be helpful to submit an example for pipecat-examples showing how to dial-in and dial-out. This would be similar to the examples that exist for Twilio, Telnyx, Plivo, and Exotel.
That's a big enough change to add and test that I think we should start there. It will also help developers get started right away as they can easily test and run the example. WDYT?
The VonageFrameSerializer should be written to work with the FastAPIWebsocketTransport. Is there a reason to add a new websocket transport to work specifically with the VonageFrameSerializer?
Sorry for the delay. We're backlogged on PR reviews. I took a quick look at this and think it's a good plan to split it up. First, can you create a PR for only the
VonageFrameSerializer? Along with this, it would be helpful to submit an example for pipecat-examples showing how to dial-in and dial-out. This would be similar to the examples that exist for Twilio, Telnyx, Plivo, and Exotel.That's a big enough change to add and test that I think we should start there. It will also help developers get started right away as they can easily test and run the example. WDYT?
The
VonageFrameSerializershould be written to work with theFastAPIWebsocketTransport. Is there a reason to add a new websocket transport to work specifically with theVonageFrameSerializer?
Hi @markbackman thank you so much for your initial review comments. Please find the reasons to keep the transport + foundational example along with VonageFrameSerializer:
- Regarding splitting the PR — in this case the VonageFrameSerializer cannot be meaningfully reviewed or tested on its own. It requires the accompanying Vonage-specific WebSocket transport and the foundational example. All three pieces form a single atomic unit: a) The serializer and transport are tightly coupled because the Vonage Audio Connector expects specific binary framing, sequencing, and pacing. b) Without the transport, the serializer cannot be executed. c) Without the example, there’s no runnable validation for reviewers.
- If you check out this branch, everything works end-to-end with the current serializer + transport + example. Splitting them would make the serializer untestable in isolation and make the PR harder to validate.
- On the dial-in/dial-out point — Vonage’s workflow differs from Twilio/Telnyx/Plivo/Exotel, so the foundational example here is the correct equivalent for Vonage. It demonstrates the Audio Connector flow as the intended usage pattern.
- Regarding FastAPIWebsocketTransport: the Vonage Audio Connector requires low-level binary frame control (opcodes, sequence numbers, 20 ms chunk pacing), which the existing transport doesn’t expose. The custom transport keeps this logic isolated without modifying core transports.
Happy to iterate further, but keeping these three components together ensures the reviewer can run and validate the integration immediately.
Additionally, today I created two PRs in the pipecat-examples repository:
- https://github.com/pipecat-ai/pipecat-examples/pull/129
- https://github.com/pipecat-ai/pipecat-examples/pull/130 These examples require the vonage-audio-connector dependency. The dependency itself is added in the Pipecat main repository, and this current PR defines it in the pyproject.toml, which the examples rely on.
Hi @markbackman and @filipi87
I’ve rebased the feature branch onto the latest main to resolve conflicts and verify the changes against the current Pipecat codebase.
I also renumbered the foundational example from 49-* to 50-*, since 49 was already in use.
To try it out, install the optional dependencies and run it the same way as other foundational examples:
uv run examples/foundational/50-vonage-audio-connector-openai.py
Please ensure the required OpenAI and Vonage environment variables are set (via .env).
If running locally, you can use:
ngrok http 8005
to obtain the wss URL and set it in the Vonage-related environment variables.
Thanks for taking a look!
Sorry for the delay on this review. It's been a busy week!
I kept thinking about your proposal and really wanted to avoid adding a new transport. Instead, I spent a little bit of time looking at how to implement this within the existing FastAPIWebsocketTransport constraints. Check out this PR:
https://github.com/pipecat-ai/pipecat/pull/3265
It adds a new mode for handling text and binary messages to the FastAPIWebsocketTransport. It also adds a new VonageFrameSerializer.
I'd propose this: let's work on PR #3265 and get the core of this work implemented. I see you have more features for the serializer in your PR. Once 3265 is merged, you can follow up with a PR to add auto hangup and any other desired features to the serializer. Does that make sense?
Also, we don't need the foundational example. We do need a pipecat-example for this. In building this out myself, I wrote the inbound example: https://github.com/pipecat-ai/pipecat-examples/pull/133
I'd love feedback on it. Also, we'll need an outbound example, which I'm happy to have you contribute.
How does this all sound?