sheepdog Distribute worker threads to all cores

Signed-off-by: hongzhou zhang [email protected]

Nov 23 '15 05:11 zhanghongzhou

@zhanghongzhou thanks a lot for your PR. Could you provide benchmarking result? Of course it doesn't have to be good for now.

Nov 24 '15 04:11 mitake

@johnzhang1985 please comment in English :)

Nov 24 '15 04:11 mitake

I can test this patch and compare to old version.

Nov 24 '15 06:11 vtolstov

@zhanghongzhou @johnzhang1985 @vtolstov

I created an additional patch for creating per cpu work queue. You can find it here: https://github.com/sheepdog/sheepdog/tree/per-cpu-work-queue

It is not tested well. I'm glad if you can test and share benchmark result.

Nov 24 '15 07:11 mitake

@mitake:

We plan to test sheepdog/per-cpu-work-queue next quarter. The branch is based on a-little-old master (04fe301 committed on 13 Nov 2015), so first I try to rebase onto the latest master (8772904 22 days ago).

However, I encountered rebase conflict. Some trial-and-error have done and I found that the branch conflicted with the commit 698e5d2 merging PR #217.

So I made my own two branches:

First one, per-cpu-work-queue-1, is the branch rebased onto "another master" which skips the conflicting commit above. This encountered neither conflict nor compile error, but its history is very different from true master.
Second one, per-cpu-work-queue-2, is the branch simply rebased onto the latest master. This encountered both conflict and compile error so I resolved them, but I am not so confident that my work is right.

I would be very glad if you review my branches, especially the last some commits in the second one. I think it is a little better than the first one.

Mar 30 '16 09:03 tmenjo

@tmenjo I think the second branch would be ok. Could you squash the commits that resolve conflict? Then they can be merged.

Apr 01 '16 07:04 mitake

@mitake I force-updated per-cpu-work-queue-2. Past four commits are squashed into three commits; zhanghongzhou's one, mitake's one (which my one resolving conflict is meld into) and my another one resolving compile error. Could you review again?

Apr 01 '16 08:04 tmenjo

I tried https://github.com/tmenjo/sheepdog/tree/per-cpu-work-queue-2 branch. It was not an expected behavior.

I started sheepdog and executed taskset command. Cpu affinities of all threads were used all processors, as follows.

# taskset -p -a `pgrep sheep| head -1`
pid 32710's current affinity mask: ffffff
pid 32712's current affinity mask: ffffff
pid 32713's current affinity mask: ffffff
            (snip)

A result of cause analysis, pthread_setaffinity_np() function isn't called because 'wi->q.wq_state' don't set value after the initialize. 'wi->q.wq_state' and pthread_setaffinity_np() function are used, as follows.

    if (wi->q.wq_state != WQ_ORDERED &&
        pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset))

'wi->q.wq_state' is always 0. WQ_ORDERED's enum value is 0. So, pthread_setaffinity_np() function isn't executed.

I think 'wi->q.wq_state' should change to 'wi->tc'. I modified the source code as follows.

# diff -u work.c.org work.c
--- work.c.org  2016-04-21 17:44:30.889242586 +0900
+++ work.c      2016-04-25 18:56:36.529934803 +0900
@@ -293,7 +293,7 @@
                        return -1;
                }
-               if (wi->q.wq_state != WQ_ORDERED &&
+               if (wi->tc != WQ_ORDERED &&
                    pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset))
                        sd_info("set thread affinity failed");
@@ -312,7 +312,7 @@
        struct wq_info *wi = container_of(q, struct wq_info, q);
        int cpu;
-       if (q->wq_state == WQ_ORDERED)
+       if (wi->tc == WQ_ORDERED)
                cpu = 0;
        else if (is_main_thread()) {
                static __thread int next_core;
@@ -379,7 +379,7 @@
        /* wait pthread_setaffinity_np() in create_worker_threads() */
        eventfd_xread(wi->create_efd);
        /* now I'm running on correct cpu */
-       cpu = wi->q.wq_state == WQ_ORDERED ? 0 : mycpu();
+       cpu = wi->tc == WQ_ORDERED ? 0 : mycpu();
        while (true) {

I started sheepdog by code as above, and executed taskset command. Cpu affinities of all threads were used dispersed processors, as follows.

# taskset -p -a `pgrep sheep| head -1`
pid 31032's current affinity mask: ffffff
pid 31034's current affinity mask: 1
pid 31035's current affinity mask: 2
pid 31036's current affinity mask: 4
pid 31037's current affinity mask: 8
pid 31038's current affinity mask: 10
pid 31039's current affinity mask: 20
pid 31040's current affinity mask: 40
pid 31041's current affinity mask: 80
            (snip)

Apr 26 '16 05:04 sa-kura

@sa-kura thanks a lot for your analysis. Is it possible to evaluate performance after your change?

Apr 26 '16 05:04 mitake

@mitake Yes. I will evaluate performance with @tmenjo

Apr 26 '16 06:04 sa-kura

@mitake @tmenjo does this useful now after some patches to control workqueue size ?

Mar 06 '17 20:03 vtolstov

@vtolstov I'm not sure yet. But if you can benchmark https://github.com/sheepdog/sheepdog/tree/per-cpu-work-queue with rebasing on the master, it will be helpful.

Mar 07 '17 01:03 mitake