python-sdks icon indicating copy to clipboard operation
python-sdks copied to clipboard

How do I play a pre-recorded message in the entrypoint?

Open ylhan opened this issue 10 months ago • 3 comments

Based on the examples, it's normal to have a greeting at the end of the entrypoint. Something like:

await agent.say("Welcome, I'm a friendly assistant...", allow_interruptions=True)

This message is repetitive and re-generating($$$) it every time is just burning tokens for no good reason. How do I play a pre-recorded message from the agent here?


I think I can reverse engineer VoicePipelineAgent.say(...) and inject a wav there but I'm curious if there's an easier way.

ylhan avatar Feb 11 '25 05:02 ylhan

thanks claude - this is ugly but it works.

async def play_greeting_file(local_participant, wav_path: str = "greeting.wav"):
    # Read WAV file
    with wave.open(wav_path, 'rb') as wav_file:
        # Get wav file properties
        sample_rate = wav_file.getframerate()
        num_channels = wav_file.getnchannels()
        sample_width = wav_file.getsampwidth()
        
        print(f"Audio properties: rate={sample_rate}, channels={num_channels}, width={sample_width}")
        
        # Create audio source with matching parameters
        audio_source = AudioSource(
            sample_rate=sample_rate,
            num_channels=num_channels,
            queue_size_ms=5000  # 5 second buffer
        )
        
        # Create and publish track
        track = LocalAudioTrack.create_audio_track("greeting", audio_source)
        await local_participant.publish_track(track)
        
        # Add a small delay to ensure everything is ready
        await asyncio.sleep(0.5)
        
        # Read and send audio data
        chunk_size = sample_rate // 10  # 100ms chunks
        while True:
            raw_data = wav_file.readframes(chunk_size)
            if not raw_data:
                break
                
            # Just pass the raw PCM data
            samples = np.frombuffer(raw_data, dtype=np.int16)
            
            frame = AudioFrame(
                data=raw_data,
                sample_rate=sample_rate,
                num_channels=num_channels,
                samples_per_channel=len(samples) // num_channels
            )
            
            await audio_source.capture_frame(frame)
        
        # Wait for audio to finish playing
        await audio_source.wait_for_playout()
        
        # Cleanup
        await audio_source.aclose()
        await local_participant.unpublish_track(track.sid)
    await play_greeting_file(ctx.room.local_participant)

ylhan avatar Feb 11 '25 06:02 ylhan

In the upcoming 1.0 agents release, you will be able to play audio in say:

Image

You can see the code in the agents repo under the dev-1.0 branch.

ChenghaoMou avatar Mar 12 '25 16:03 ChenghaoMou

That’s awesome!

ylhan avatar Mar 12 '25 17:03 ylhan