Perf degradation seen for OpenVino brain_tumor_seg_0001_fp16 with "[LibOS] Always allocate a slot for a wakeup handle in do_epoll_wait" commit
Description of the problem
Copied below are the results for one of the OpenVino models when run with Gramine SGX from Sept 21st and Sept 22nd. We are seeing an 13% increase in degradation (Linux Native/Gramine SGX) with [LibOS] Always allocate a slot for a wakeup handle in do_epoll_wait
If you compare just the Gramine SGX numbers, there's almost a 20% drop.
This experiment was performed on 2 machine and for both similar degradation is observed with Sept 22nd commit.
Steps to reproduce
This test requires an enclave size of 128GB, so let me know if you need a server to reproduce the issue.
Expected results
Actual results
Gramine commit hash
First of all standard deviation compared to average is ~15% in those numbers so 13% difference in degradation could be just a statistical error. Secondly this is a bug fix of an issue with potential security implications and there doesn't seem to be any other way of solving it, so I don't think we can do much.
Yes, standard deviation seems very high. On the other hand, it certainly feels like the [LibOS] Always allocate a slot for a wakeup handle in do_epoll_wait commit degraded performance generally.
Now that I think of it, we introduced a malloc + free in the generic case of epoll_wait() (we allocate a one-slot array first, then we re-allocate it with an actual number of handles to poll). And we already hit the inefficient implementation of multi-threaded memory allocator in Gramine several times. So this is yet another instantiation of this problem.
I think we should very seriously consider reimplementing our (slab) memory allocator, moving away from a global lock to more fine-grained (per-thread) locking.
I'm not sure how this malloc + free make things that much slower here, where we already have a malloc there. Also it's epoll_wait(), it almost always sleeps form some indeterminate amount of time.
But yes, we have to rework malloc anyway (but I doubt it will help much in this case).