srs icon indicating copy to clipboard operation
srs copied to clipboard

WebRTC: 使用FFmpeg内置opus编码,音频aac转opus时滋滋爆音

Open chundonglinlin opened this issue 2 years ago • 1 comments



Patch Commit ID:

SRS Version(版本): developv5.0.36

  1. 编译FFmpeg
  • 使用FFmpeg内置opus:--enable-decoder=opus --enable-encoder=opus
  • 使用libopus库:--enable-libopus
  1. SRS Config(配置):
listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;

http_server {
    enabled         on;
    listen          8080;
    dir             ./objs/nginx/html;

http_api {
    enabled         on;
    listen          1985;

rtc_server {
    enabled on;
    listen 8000; # UDP port
    # @see
    #candidate $CANDIDATE;

vhost __defaultVhost__ {
    rtc {
        enabled     on;
        # @see
        rtmp_to_rtc on;
        # @see
        rtc_to_rtmp on;
    http_remux {
        enabled     on;
        mount       [vhost]/[app]/[stream].flv;


Please describe how to replay the bug? (重现Bug的步骤)

  1. rtmp推流: ffmpeg -stream_loop -1 -re -i 264_aac_basline_48k.mp4 -c copy -f flv "rtmp://"

  2. rtc播放,打开播放器https://,播放:



chundonglinlin avatar Aug 09 '22 02:08 chundonglinlin


rm -rf objs/ffmpeg/*

winlinvip avatar Aug 10 '22 01:08 winlinvip

Specify using the built-in opus in FFmpeg, with the option:

  --ffmpeg-opus=on|off      Whether enable the FFmpeg native opus codec. Default: off

After changing the opus library, you need to delete FFmpeg and recompile it.

rm -rf objs
./configure --ffmpeg-opus=on


winlinvip avatar Jan 06 '23 09:01 winlinvip

The main reason is that the frame_size set in ffmpeg's opusenc.c (note, not libopusenc.c) is 120, and opus has a sampling rate of 48000, which means each frame is 2.5ms.

From my testing, it seems that approximately every 5 input frames are needed to obtain one output frame from the encoder. Moreover, the latency is high and the size is small, as shown in the following figure.

image The PCM frames from pts 2936 to 2954 are input to the encoder, and the frame at pts 2589 is the output from the encoder. It can be observed that there is a significant delay of 300ms+ and the size is very small.

Any experts who know how to fix this issue, please help and reply to this problem.


xiaozhihong avatar Mar 23 '23 13:03 xiaozhihong

Is it possible to fix it by upgrading FFmpeg to 5.1?

winlinvip avatar Mar 23 '23 13:03 winlinvip

Is it possible to fix it by upgrading FFmpeg to 5.1?

Test FFmpeg last release 5.1.3, the problem already solved.

SRS will update FFmpeg from 4.x to 5.1.3

xiaozhihong avatar Mar 29 '23 08:03 xiaozhihong

Following the test method mentioned above, switching to the ffmpeg5.1.3 version, the issue of loud crackling electrical noise still persists.

Added debugging logs,

srs_error_t SrsAudioTranscoder::decode_and_resample(SrsAudioFrame *pkt)
    srs_error_t err = srs_success;

    dec_packet_->data = (uint8_t *)pkt->samples[0].bytes;
    dec_packet_->size = pkt->samples[0].size;

    srs_trace("decode_and_resample: dec_packet_->size=%d", dec_packet_->size);

    // Ignore empty packet, see
    if (!dec_packet_->data || !dec_packet_->size){
        return err;

    char err_buf[AV_ERROR_MAX_STRING_SIZE] = {0};
    int error = avcodec_send_packet(dec_, dec_packet_);
    if (error < 0) {
        return srs_error_new(ERROR_RTC_RTP_MUXER, "submit to dec(%d,%s)", error,
            av_make_error_string(err_buf, AV_ERROR_MAX_STRING_SIZE, error));

    new_pkt_pts_ = pkt->dts + pkt->cts;
    while (error >= 0) {
        error = avcodec_receive_frame(dec_, dec_frame_);
        if (error == AVERROR(EAGAIN) || error == AVERROR_EOF) {
            return err;
        } else if (error < 0) {
            return srs_error_new(ERROR_RTC_RTP_MUXER, "Error during decoding(%d,%s)", error,
                av_make_error_string(err_buf, AV_ERROR_MAX_STRING_SIZE, error));

        // Decoder is OK now, try to init swr if not initialized.
        if (!swr_ && (err = init_swr(dec_)) != srs_success) {
            return srs_error_wrap(err, "resample init");

        int in_samples = dec_frame_->nb_samples;
        const uint8_t **in_data = (const uint8_t**)dec_frame_->extended_data;
        int idx = 0;
        do {
            /* Convert the samples using the resampler. */
            int frame_size = swr_convert(swr_, swr_data_, enc_->frame_size, in_data, in_samples);
            if ((error = frame_size) < 0) {
                return srs_error_new(ERROR_RTC_RTP_MUXER, "Could not convert input samples(%d,%s)", error,
                    av_make_error_string(err_buf, AV_ERROR_MAX_STRING_SIZE, error));
            srs_trace("idx=%d in_samples=%d, enc_->frame_size=%d, frame_size=%d", idx++, in_samples, enc_->frame_size, frame_size);

            in_data = NULL; in_samples = 0;
            if ((err = add_samples_to_fifo(swr_data_, frame_size)) != srs_success) {
                return srs_error_wrap(err, "write samples");
        } while (swr_get_out_samples(swr_, in_samples) >= enc_->frame_size);

    return err;


  • After each frame is decoded by libopus, swr_convert is used once.
  • After each frame is decoded by ffmpeg-opus, swr_convert is used 8 or 9 times. image

Audio encoder initialization: image


chundonglinlin avatar Oct 18 '23 16:10 chundonglinlin