gramine icon indicating copy to clipboard operation
gramine copied to clipboard

Perf degradation seen for OpenVino brain_tumor_seg_0001_fp16 with "[LibOS] Always allocate a slot for a wakeup handle in do_epoll_wait" commit

Open jinengandhi-intel opened this issue 3 years ago • 3 comments

Description of the problem

Copied below are the results for one of the OpenVino models when run with Gramine SGX from Sept 21st and Sept 22nd. We are seeing an 13% increase in degradation (Linux Native/Gramine SGX) with [LibOS] Always allocate a slot for a wakeup handle in do_epoll_wait

If you compare just the Gramine SGX numbers, there's almost a 20% drop.

image

This experiment was performed on 2 machine and for both similar degradation is observed with Sept 22nd commit.

Steps to reproduce

This test requires an enclave size of 128GB, so let me know if you need a server to reproduce the issue.

Expected results

Actual results

Gramine commit hash

jinengandhi-intel avatar Sep 24 '22 09:09 jinengandhi-intel

First of all standard deviation compared to average is ~15% in those numbers so 13% difference in degradation could be just a statistical error. Secondly this is a bug fix of an issue with potential security implications and there doesn't seem to be any other way of solving it, so I don't think we can do much.

boryspoplawski avatar Sep 24 '22 18:09 boryspoplawski

Yes, standard deviation seems very high. On the other hand, it certainly feels like the [LibOS] Always allocate a slot for a wakeup handle in do_epoll_wait commit degraded performance generally.

Now that I think of it, we introduced a malloc + free in the generic case of epoll_wait() (we allocate a one-slot array first, then we re-allocate it with an actual number of handles to poll). And we already hit the inefficient implementation of multi-threaded memory allocator in Gramine several times. So this is yet another instantiation of this problem.

I think we should very seriously consider reimplementing our (slab) memory allocator, moving away from a global lock to more fine-grained (per-thread) locking.

dimakuv avatar Sep 26 '22 06:09 dimakuv

I'm not sure how this malloc + free make things that much slower here, where we already have a malloc there. Also it's epoll_wait(), it almost always sleeps form some indeterminate amount of time.

But yes, we have to rework malloc anyway (but I doubt it will help much in this case).

boryspoplawski avatar Sep 26 '22 12:09 boryspoplawski