doris [Enhancement] better merge single_replica_load_rpc_service into brpc

[Enhancement] better merge single_replica_load_rpc_service into brpc_service

Open freemandealer opened this issue 1 year ago • 0 comments

Search before asking

[X] I had searched in the issues and found no similar issues.

Description

better merge single_replica_load_rpc_service into brpc_service

refer to these to see what is single replica load: single replica load design single replica implement

The code would be much clarified if there is only one rpc service. The reason why the original code uses two services is that rpc will time out under heavy workloads if use just one.

However, according to the design of brpc thread model, even if running in separated service thread pools, the underlying pthreads/cpu cores are shared by the two services. I think the reason why the two services solution works is only because it claims more worker pthreads, which can also be achieved by simply setting bigger --bthread_concurrency gflag config.

So two services achieve nothing but messyness. We should merge them into one and identify&fix the root cause of the timeout problem.

Solution

step1. merge single_replica_load_rpc_service into brpc_service

step2. test the system with heavy load jobs to try to reproduce the rpc timeout problem

step3. make assumptions & proving

some of my guesses:
void PInternalServiceImpl::request_slave_tablet_pull_rowset(...) {
    ...
    worker_pool.offer([=]() {
       ...     
    }
    ...
}
if the worker pool is full (possible under heavy loads), the pthread will wait under std::condition_variable. If most of the pthreads in the pool are blocked, there would be few pthread to deal with rpc request and then timeout happens.

If it is true, we need to use bthread's condition variable instead of std::condition_variable. The former one will only yeild the underlying pthread to other rpc jobs, rather than blocking it.

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Mar 02 '23 07:03 freemandealer

doris doris copied to clipboard

[Enhancement] better merge single_replica_load_rpc_service into brpc_service

Search before asking

Description

Solution

Are you willing to submit PR?

Code of Conduct

doris
doris copied to clipboard