pjproject icon indicating copy to clipboard operation
pjproject copied to clipboard

Asynchronous conference bridge operation

Open nanangizz opened this issue 1 year ago • 5 comments

In #3183, video conference bridge adopts asynchronous ports operations to handle non-uniform lock ordering issues. This PR integrates the same mechanism for audio conference bridge.

Beside avoiding deadlock, this change may improve performance a little bit as audio data processing in the conference bridge will be performed without holding a mutex.

There are some pending tasks (e.g: integrate this into audio switchboard).

Potential backward compatibility issues (major):

  1. Pool release immediately after port removal, consider this code:

    pool = pj_pool_create(...);
    pjmedia_tonegen_create(pool, ..., &tonegen_port);
    pjmedia_conf_add_port(conf, ..., tonegen_port, ..., &tonegen_port_id);
    ...
    pjmedia_conf_remove_port(conf, tonegen_port_id);
    pjmedia_port_destroy(tonegen_port);
    pj_pool_release(pool); /* <-- This is now a problem!
                                  It was fine with a synchronous pjmedia_conf_remove_port() */
    

    Note that the PJMEDIA tonegen port does not have its own pool (it allocates its own instances and all internal states using the supplied memory pool), so releasing the pool immediately after removing the port from conference bridge may cause crash as the port removal is now asynchronous and it may be still being accessed by the conference bridge.

    Unfortunately, the real removal operation will not invoke any callback, so application cannot be so sure the safe time for releasing the pool. FYI the actual removal is done by the conference bridge clock which usually is a sound device, so if the clock is running, adding pj_thread_sleep(100) before relasing pool may avoid the crash (note that clock interval is usually 20ms, considering possible audio device burst, 100ms delay is generally sufficient).

    Now all PJMEDIA ports have been updated to have their own pool, except bidirectional & stereo ports (currently not used). Any third-party PJMEDIA ports that are registered to the conference bridge will need to be made sure that they have their own pools.

  2. PJMEDIA Stream pattern. PJMEDIA Stream exports a PJMEDIA port interface which can be queried using pjmedia_stream_get_port(). Unfortunately the port does not implements pjmedia_port_destroy(), it is destroyed using pjmedia_stream_destroy() instead, which will release the memory pool. Similar to above issue, if pjmedia_stream_destroy() is invoked immediately after pjmedia_conf_remove_port(), it may cause crash.

    Now PJMEDIA Stream has a group lock, so pjmedia_stream_destroy() will just decrement the reference counter, the real destroy will be done when the reference counter reaches zero.

  3. Premature caching pool destroy. When all PJMEDIA ports have their own pools, there is still possibility that application uses its own pool factory for creating the ports and destroy the pool factory (e.g: invoking pj_caching_pool_destroy()) prematurely before the actual port removal is done. For example, in application shutdown, the conference bridge clock has been stopped so actual port removal can only be done later by pjmedia_conf_destroy()/pjsua_destroy(), but in between, application may destroy its pool factory which may cause crash.

    A possible solution is to delay the pool factory destroy to PJLIB deinitialization via pj_atexit().

  4. Premature pool release when pool created using pjsua_pool_create(). Similar to premature caching pool destroy above, any application pool that is used for creating PJMEDIA ports needs to be released after the conference bridge (PJSUA internal object) is destroyed, this can be accomplished by scheduling the pool release using pjsip_endpt_atexit() (don't use pj_atexit() as it will be too late, PJSUA destroys its caching pool factory before calling pj_shutdown()), e.g:

    static pj_pool_t *app_pool;
    
    static void app_cleanup(pjsip_endpoint *endpt) {
      PJ_UNUSED_ARG(endpt);
      pj_pool_safe_release(&app_pool);
    }
    
    int main() {
      app_pool = pjsua_pool_create(...);
      pjsip_endpt_atexit(pjsua_get_pjsip_endpt(), &app_cleanup);
      ...
    }
    

nanangizz avatar Apr 16 '24 07:04 nanangizz

Since issue 2 is already handled, I assume it's no longer a problem (i.e. no longer causes backward compatibility)?

Which only leaves issue 1. I understand it will be quite a lot to change all existing pjmedia_port(s) to have its own pool, but since it's necessary for this feature, I vote to continue.

And if 1&2 are already resolved, the change in pjsua_app.c is no longer necessary, correct? (i.e. app will not need to change anything).

sauwming avatar Apr 16 '24 08:04 sauwming

Since issue 2 is already handled, I assume it's no longer a problem (i.e. no longer causes backward compatibility)?

Which only leaves issue 1. I understand it will be quite a lot to change all existing pjmedia_port(s) to have its own pool, but since it's necessary for this feature, I vote to continue.

The backward compatibility issues cannot be completely avoided, e.g:

  • if any custom/proprietary pjmedia ports are used, need to check/update them,
  • applications may need to be adjusted to avoid issue 3 (pool factory premature destroy).

And if 1&2 are already resolved, the change in pjsua_app.c is no longer necessary, correct? (i.e. app will not need to change anything).

Yes for pjsua_app.c, other apps may need adjustment (as mentioned above).

nanangizz avatar Apr 16 '24 09:04 nanangizz

Regarding issue 3, I don't think app is allowed to call pj_caching_pool_destroy() before pjmedia_conf_destroy() since the conference creation uses the pool, i.e. pjmedia_conf_create(pool, ...)? Also, using pj_atexit() doesn't seem possible since pjsip can be restarted (instead of shutdown).

The important thing here seems to be that pjmedia_conf_destroy() must guarantee the completion of all pending async operations, if any.

sauwming avatar Apr 17 '24 02:04 sauwming

Regarding issue 3, I don't think app is allowed to call pj_caching_pool_destroy() before pjmedia_conf_destroy() since the conference creation uses the pool, i.e. pjmedia_conf_create(pool, ...)? Also, using pj_atexit() doesn't seem possible since pjsip can be restarted (instead of shutdown).

Here is a sample scenario. App using PJSUA creates its own pool factory (instead of using PJSUA's) for some reason, and it uses the pool/factory to create a PJMEDIA port. It has to destroy the pool factory before shutting down PJSUA.

Re: lib (PJSUA?) restart, pj_atexit() will still be called right?

The important thing here seems to be that pjmedia_conf_destroy() must guarantee the completion of all pending async operations, if any.

Yes, pjmedia_conf_destroy() flushes all pending async ops.

nanangizz avatar Apr 17 '24 03:04 nanangizz

Here is a sample scenario. App using PJSUA creates its own pool factory (instead of using PJSUA's) for some reason, and it uses the pool/factory to create a PJMEDIA port. It has to destroy the pool factory before shutting down PJSUA.

I see.

Re: lib (PJSUA?) restart, pj_atexit() will still be called right?

Ah, right. It's called upon pj_shutdown().

sauwming avatar Apr 17 '24 03:04 sauwming