PhotonLibOS icon indicating copy to clipboard operation
PhotonLibOS copied to clipboard

WorkerPool usage uncertainty (multi vcpus)

Open talhasaruhan opened this issue 1 year ago • 9 comments

I know the documentation will be done after feature work is complete, however I'd like some pointers in the meantime, thank you.

As far as I can see WorkerPool creates a std::thread for each vcpu, each of those threads are then initialized for photon thread usage. Then they go into a main loop where they take tasks off of a shared ring queue and perform them.

On the other hand, in net-perf.cpp: 1- photon thread gets created by the server (accept_loop creates a new photon thread upon accept) 2- initially this photon thread is on the OS thread that initialized the server 3- In the handler, as first thing it gets migrated to one of the work_pool vcpus, thus freeing the main vcpu.

Before I ask my question I want to say that so far this is my understanding, please correct me if I'm wrong. So what I'm having trouble understanding is how the vcpu ends up executing the migrated photon thread. It seems like the ring queue for the work pool should be empty since noone queued anything. So the main_loop function that gets executed by the OS thread yields at ring.recv() however since the default template parameter is std::thread::yield, that just means that OS thread yields back to OS scheduler, right? So photon::thread_yield is not called and I don't understand how the migrated photon thread ends up getting executed.

talhasaruhan avatar Sep 12 '22 22:09 talhasaruhan

That's basically two typical usages of work pool.

The main_loop in work pool is to handle requests when you submitted from work_pool->call or work_pool->async_call, where the two of them will both receive a function or a callable object(lambda) and send tasks by RingQueue. Each loop is blocking on photon::semaphoras and will drain tasks from the queue.

The other usage is to create a Photon thread by your self and migrate to another vcpu. Considering a simplified main_loop in the work pool OS thread, where it was running the photon::thread_usleep(-1). Even though it has no jobs to do in the foreground, the event engine in io/fd-events.h is working in the background. It will still schedule photon threads with the help ofstandbyq (standby queue) of vcpu, in thread/thread.cpp.

beef9999 avatar Sep 13 '22 03:09 beef9999

BTW which project are you working at? You seemed to have dived a lot deep than other developers :-)

beef9999 avatar Sep 13 '22 03:09 beef9999

Thanks for the explanation. However I'm not sure where the photon::sleep(-1) call is. As I said all I see is std::this_thread::yield() call from the ring.recv(). Could you provide a file and line number please? When it comes to your question, I'm not working in any professional capacity. I was just trying to understand the code before I make a decision to pick one library or the other to experiment. As you can imagine this kind of coroutine & async io libraries are quite invasive in the codebase.

talhasaruhan avatar Sep 13 '22 15:09 talhasaruhan

thread_usleep(-1) means sleep forever. It is in thread.cpp. When there are no other jobs, the scheduler will enter the idle_stub.

beef9999 avatar Sep 13 '22 18:09 beef9999

Migrate legacy code is easy for Photon. See release notes of version 0.3 . We managed to rewrite rocksdb into coroutine based programs simply by search and replace code. and most of the CI tests could pass

beef9999 avatar Sep 13 '22 18:09 beef9999

If you are migrating threads by yourself , the main loop will block at semaphore wait, and semaphore wait will eventually call thead_usleep, so these two approaches are the same

beef9999 avatar Sep 13 '22 18:09 beef9999

Ahh ok now I see it. Thank you for the explanation! The reason I'm trying to understand this more in depth is that I'm curious about the possibility of using Photon as more than just for I/O but also as a job system as well (on a different worker pool than I/O)

talhasaruhan avatar Sep 14 '22 10:09 talhasaruhan

I have a few other questions about the library, not sure if this is the right place to ask but here they are:

1- As far as I understand the socket server itself runs on a single OS thread, and dispatches jobs from there, right? Not sure if this would be an improvement, but have you explored the idea of server itself also running on a worker pool?

2- Worker pool seems to support a thread pool, I'm assuming to avoid spinning up a new thread (which are stackful coroutines, so stacks would consume a lot of memory) for every single job. 2-a If it's not much trouble could you explain how thread pool works? 2-a Are you planning on supporting C++ 20 stackless coroutines?

3- In my humble opinion from reading the codebase, seems like it's not obvious which calls are going to cause the thread to yield (and since they're stackful any call down the callstack can decide to yield). Do you have any plans to improve that perhaps? This is especially problematic if you want to use the worker pool / thread pool concept as a general purpose job system, since the yield points will be well hidden.

talhasaruhan avatar Sep 14 '22 11:09 talhasaruhan

1- Yes, and it's also possible that multiple vCPUs (OS thread) run a same socket server, so as to accept a lot of connections in a short time;

2- The allocation of a stack is large, though, it consumes only a piece of address space; and the OS kernel allocates real page frames when they are actually used, via page faults. So the stacks do not consume too much memory. Thread pool is used to avoid re-allocation of threads / stacks, so as to improve the speed of thread creation, by reserving more memory.

2-a Yes, we are planning to support C++20 stackless coroutine, in a hybrid manner.

3- Yielding happens only by calling photon::thread_usleep() or photon::thread_yield(). And yes, any function can decide to yield, if needed. It may even happen on some indeterministic condition(s). We usually assume the functions to call are possible to yield, unless we are sure they don't. In C++, we can specify whether a function can throw, but there's no way to specify whether a function can yield (call a specified function) or not. Do you have any suggestion to improve this?

BTW, I'm curious why yielding points are so important to your job system?

lihuiba avatar Sep 14 '22 15:09 lihuiba