Using photon::semaphore on AWS EC2, `signal` latency is high.
Using photon::semaphore on AWS EC2, signal latency is high.
I have using photon::semaphore on AWS EC2. I found signal function may cost 300~600ms. I don't know what happens and why signal cost so long. I'm seeking assistance in understanding the cause and debugging strategies🙏.
- How many vCPUs are you using to synchronize with this semaphore? How many cores are there in your EC2.
- Does this issue appear on other platforms or physical machines?
semaphore signal consists of lock and eventfd write (if across vCPU)
@beef9999 my vcpu count is 12 and actually pod(deploy in k8s) limit 4 CPUs.
I get vcpu count by using photon::get_vcpu_num();
I didn't get same issue on other platforms such as Tencent Cloud.
@beef9999 why my vcpu count is 12: I have three list of Executor, all of them length is 4. And they used to work as server, client and internal jobs.
@beef9999 "How many vCPUs are you using to synchronize with this semaphore?" I think it is 1. I create a semaphore, and send into a std::thread, then wait it outside, such as following(project works in photon pool):
AsyncRetSharedPtr func A() {
AsyncRetSharedPtr ret;
_thread_pool->performance[ret]{
do_something();
ret->semaphore.signal();
}
}
int main() {
AsyncRetSharedPtr a = A();
a->semaphore.wait();
}
@lucaspeng12138 Which branch and event engine are you using? How CPUs does the EC2 instance have?
@lihuiba I'm using branch v0.7.1. Because we found it has better performance than later version.
EC2 CPU:
@lucaspeng12138 Could you try branch 0.8 and see whether the huge latency of 300~600ms exists or not? We had a major revision of semaphore since that branch, which may have fixed the issue.
BTW, could you talk more about the performance advantage of 0.7?
I think the best practice is to set the number of vCPUs to be the same as the number of physical cores (of your EC2 instance). At least do not exceed it.
Since there is a spinlock in the Photon's lock implementation, I'm not sure if there will be performance issues when multiple OS threads are competing on a small set of physical cores.
@beef9999 He has 48 cpus in the EC2 instance.
@lucaspeng12138 Do you use epoll or io_uring?
my vcpu count is 12 and actually pod(deploy in k8s) limit 4 CPUs
@lucaspeng12138 4 cpu in total or 4 * 12?
We had a major revision of semaphore since that branch
And we also had another revision of thread scheduler, which improved latency of cross-vcpu interrupt (wake up) of threads. So Trying 0.8 is highly recommended.
@lucaspeng12138 Do you use epoll or io_uring?
@lihuiba I'm using epoll
my vcpu count is 12 and actually pod(deploy in k8s) limit 4 CPUs@lucaspeng12138 4 cpu in total or 4 * 12? @beef9999 4 cpu limits/pods, vcpu count is total 12/pods
I think the best practice is to set the number of vCPUs to be the same as the number of physical cores (of your EC2 instance). At least do not exceed it.
Since there is a spinlock in the Photon's lock implementation, I'm not sure if there will be performance issues when multiple OS threads are competing on a small set of physical cores.
@beef9999 In our case, we have three pool, and it's difficult to use same one. I may try this.
@lucaspeng12138 Could you try branch 0.8 and see whether the huge latency of 300~600ms exists or not? We had a major revision of semaphore since that branch, which may have fixed the issue.
BTW, could you talk more about the performance advantage of 0.7?
@lihuiba Semaphore optimization is still in main.
@lucaspeng12138 You can try the main branch. And is there any way you can reduce your vCPU num?
@lucaspeng12138 Could you try branch 0.8 and see whether the huge latency of 300~600ms exists or not? We had a major revision of semaphore since that branch, which may have fixed the issue. BTW, could you talk more about the performance advantage of 0.7?
@lihuiba Semaphore optimization is still in main.
@lucaspeng12138 You can try the main branch. And is there any way you can reduce your vCPU num?
Yes, I can decrease the vCPU count by reducing the thread pool size. I will try both your suggestions: cherry-pick optimization from main and reducing the vCPU count.
@lucaspeng12138 Maybe you can try Photon's WorkPool, instead using a custom ThreadPool. A WorkPool is able to manage all vCPUs. WorkPool uses MPMC queue to dispatch tasks to vCPU.
@lucaspeng12138 Maybe you can try Photon's WorkPool, instead using a custom ThreadPool. A WorkPool is able to manage all vCPUs. WorkPool uses MPMC queue to dispatch tasks to vCPU.
This Photon's WorkPool aim to use all physical CPU and has much better performance than std::thread Pool? And need to know how many physical CPU there is and calling same time "photon::init" right?
WorkPool will call photon::init for every vCPU it created. You can imagine it as a dispatcher of tasks(functions/lambdas) to vCPUs.
See this demo https://github.com/alibaba/PhotonLibOS/blob/main/thread/test/perf_workpool.cpp
4 cpu limits/pods, vcpu count is total 12/pods
In case there are more vCPUs (12) than physical ones (4), it is possible that spinlocks spend much more CPU time than usual. It is not specific to Photon, but inherent in spinlocks. I'm not sure whether this is the reason, but you can try to give it enough (12) CPU cores for testing and see whether it works.
4 cpu limits/pods, vcpu count is total 12/pods
In case there are more vCPUs (12) than physical ones (4), it is possible that spinlocks spend much more CPU time than usual. It is not specific to Photon, but inherent in spinlocks. I'm not sure whether this is the reason, but you can try to give it enough (12) CPU cores for testing and see whether it works.
I increased physical CPU require and limit count on pod even larger than 12, this issue still exist. Too busy to continue seeking issue reason, I will try your advice "Use patch photon::sempaphore in main" and tell you test result later.
I increased physical CPU require and limit count on pod even larger than 12, this issue still exist.
So this precludes the possibility of spinlocks.
Too busy to continue seeking issue reason
Can you give a minimal example that demonstrates the issue?