Performance bottleneck in audio publishing with LiveKit Python SDK at scale

Open Rishi199602 opened this issue 10 months ago • 1 comments

Issue Description Problem Description I've discovered a significant performance bottleneck when publishing audio files to LiveKit rooms using the Python SDK. While the SDK works well with a small number of users (5), performance degrades dramatically when scaling to just 25 concurrent users, making it impractical for our target load of 1000 concurrent users.

Expected Behavior Publishing time should scale linearly or at least reasonably with the number of concurrent users Should be able to handle hundreds of concurrent audio publishes with appropriate resource management

Code Example

`async def publish_audio_file(self, file_path): async with self._audio_publish_semaphore: logger.info(f"Acquired audio publishing semaphore for {self.username}") try: start_time = time.time()

        # Get audio properties
        with wave.open(file_path, 'rb') as wav_reader:
            channels = wav_reader.getnchannels()
            sample_rate = wav_reader.getframerate()
        
        # Create audio source and track
        source = rtc.AudioSource(sample_rate, channels)
        track = rtc.LocalAudioTrack.create_audio_track("audio", source)
        options = rtc.TrackPublishOptions()
        options.source = rtc.TrackSource.SOURCE_MICROPHONE
        
        # Publish the track - THIS IS WHERE THE DELAY OCCURS
        publication = await self.room.local_participant.publish_track(track, options)
        logger.info(f"Published track {publication.sid}")
        publish_time = time.time()
        logger.info(f"Published track in {publish_time - start_time} seconds.")
        
        # Read and send audio data
        with wave.open(file_path, 'rb') as wav_file:
            num_frames = wav_file.getnframes()
            all_frames = wav_file.readframes(num_frames)
            
            frame = rtc.AudioFrame.create(
                sample_rate=sample_rate,
                num_channels=channels,
                samples_per_channel=num_frames // channels
            )
            
            audio_data = np.frombuffer(frame.data, dtype=np.int16)
            frame_data = np.frombuffer(all_frames, dtype=np.int16)
            
            copy_length = min(len(audio_data), len(frame_data))
            np.copyto(audio_data[:copy_length], frame_data[:copy_length])
            
            await source.capture_frame(frame)
    finally:
        logger.info(f"Released audio publishing semaphore")`

Please suggest any changes/improvements I can do here.

Mar 11 '25 11:03 Rishi199602

Python has GIL and you'd need to use multiprocessing to take advantage of multiple cores.

Apr 05 '25 05:04 davidzhao