Add "Runtime basics" to the tutorial
While discussing https://github.com/ponylang/ponylang-website/issues/502 (add quiescence FAQ).
- Garbage collector basics (there are 2 entries in the appendix)
- Quiescence
- ASIO system
- Env.exitcode
Perhaps a link to runtime options, also information on how to find more information.
Any thoughts on where this chapter should go? My opinion in after Packages and before Testing as the former ends on the Standard Library and the latter starts into sort of advanced Pony usage.
Adding a task list to ensure I stay organized (this task list was build over the course of writing):
- [x] Add new chapter and weight accordingly in config for menu placement
- [x] Add section on on Env.exitcode
- [x] Add section on garbage collector
- [x] Add section on ASIO system
- [x] Add section on quiescence (see #157, as well)
- [x] Add Backpressure/Muting section
- [x] Add section on Runtime Options
After re-reading both the runtime-related appendices (Memory Allocation at Runtime, and Garbage Collection with Pony-ORCA), I think this information should be moved into the new chapter rather than repeated in the appendix.
Also, Memory Allocation at Runtime ends with an ellipsis. Was there something more intended to go there?
@rhagenson I have no idea why the ellipsis is there.
@rhagenson do you have all the info you need?
@SeanTAllen I believe so. I have begun work over on https://github.com/rhagenson/pony-tutorial/tree/runtime-chapter Mostly so far it has been a refactor to move the runtime content from Appendices into the new chapter and rewrite that content to stress the pertinent details.
Friendly update that I am back on top of this. Had to take a hiatus while, among other things, I prepared a tutorial submission for the largest conference in my field. That tutorial will include Pony, if accepted.
Sort of preemptive question so feel free to answer solely off the cuff: what have been some past references for any of this information? E.g., blog post on how the ASIO system works, video on quiescence, past issue/PR that involved ample discussion of garbage collection, etc. Looking to make the language consistent between what has been deemed good/helpful in the past and the tutorial.
re: gc... see the orca paper: https://www.ponylang.io/media/papers/orca_gc_and_type_system_co-design_for_actor_languages.pdf and this video from @aturley: https://vimeo.com/181099993
Already had the ORCA paper, but not the video. Thank you for both.
Any similarly useful references for the other topics?
There are some early discussions from the beginning of the #runtime stream in Zulip (that were actually copied from the Slack) that cover a few runtime topics.
@EpicEric I had no idea there even was a runtime stream on Zulip. Thank you for pointing me toward it.
Reminder to self (and noted here for others to hold me to it):
- [ ] Find good place to mention the number of scheduler threads is N where N is core count + 1 ASIO thread
The way I have it written now this detail would be forced or detract from the point wherever I put it. I am sure the content will change so not going to force it now when later unforced addition is still possible.
Number of scheduler threads is N where N is the core count. There is additionally an asio thread that handles receiving asio messages, however, it never runs any actors. When we say "scheduler threads", the asio thread is not included in that.
Understand the distinction now. Still will need to find a good place for this detail at a later time to ensure it is not just dropped in somewhere.
Couple of things to know about scheduler threads.
You can change the default number using --ponymaxthreads to set to less than N where N is the number of cores.
By default, the runtime will stop using scheduler threads that "aren't needed". This helps keep excessive work stealing from happening. This can scale down to 0 scheduler threads. At 0, the only thread running will be the ASIO thread that is waiting to receive an event. Once an event is received, at least 1 scheduler thread will be started back up. You can set a minimum number of scheduler threads to always keep running using the --ponyminthreads option.
If you want to, you can turn off scheduler thread scaling by using --ponynoscale.
There is also a --ponysuspendtheshold that has an impact on scheduler thread scaling.
Currently thinking, given the distinction just noted, that this might naturally fit into the ASIO system section in the reverse of the way stated here, i.e.: there is one ASIO thread + N scheduler threads...
For all the runtime-related options, rather than spread them throughout the new chapter, how about a section called "Runtime Options" that is the last section in the chapter and covers these runtime configuration options like pinning ASIO, changing minimum scheduler thread count, etc?
What states can an actor be in besides: alive, blocked, dead, and muted?
- Alive: running a behavior or processing a message from its queue
- Blocked: completed execution and no messages waiting in its queue
- Dead: blocked itself and all actors with a reference to it are blocked
- Muted: attempted send to overloaded actor and itself is not overloaded (is the result of backpressue and will be scheduled once backpressure decreases)
These are the ones I know of by reading through the #runtime stream and past runtime content from the tutorial. I want to ensure I am not neglecting an actor state.
@rhagenson I'm not aware of Dead being used as a term.
- Alive -> Scheduled
- Muted -> Muted
- Blocked -> Unscheduled
- Dead -> There is no state for this in the runtime.
For Unscheduled, a distinction could be made between "has no messages and therefore doesn't exist in a queue for a scheduler thread" and "has messages and is waiting in Scheduler thread's queue".
That would give 4 states. But we don't have agreed-upon terminology for those 2 possible "unscheduled" states.
Blocked would be one possible term in for the first unscheduled state (and is noted in actor.c as being "logically blocked"). We don't have a name afaik for the 2nd of the 2. Generally "Blocked" is mostly used when the cycle detector is in use. I think it would be reasonable to use as you have defined.
There is also "overloaded" and "under pressure" that could be considered states as well that are separate.
Sorry if this doesn't help much. I'm trying to provide more info, You are asking good questions.
EDIT
I'm realizing that "Unscheduled" might be problematic as there is a flag you can set via the C api called FLAG_UNSCHEDULED to manually remove an actor from scheduling. It isn't used anymore but it exists. This conversation is making me realize that we should definitely discuss an RFC to remove.
EDIT 2
re: C api -> there is a C api that is exposed that allows you to control various parts of the runtime including starting it up, scheduling actors, creating them etc. It isn't used by Pony but could be used to embed the Pony runtime in other systems.
For all the runtime-related options, rather than spread them throughout the new chapter, how about a section called "Runtime Options" that is the last section in the chapter and covers these runtime configuration options like pinning ASIO, changing minimum scheduler thread count, etc?
This sounds reasonable.
@SeanTAllen Thank you for the information.
For your own knowledge of where "dead: cropped up, it is in the Appendix on GC/ORCA that is being moved to the new chapter.
https://github.com/ponylang/pony-tutorial/blob/228ef53fec43b3182062c713077eb1682692db63/content/appendices/garbage-collection.md#L23
Currently I use the three states: alive, blocked, dead and have not made mention of muted yet (same problem as the scheduler thread problem that I do not have a "natural" place for it yet). I then reuse the term "dead" in the Quiescence section to differentiate between collecting an individual actor and collecting a cycle of actors (i.e., it takes a cycle of dead actors to GC them all at once).
As for "overloaded" and "under pressure" I think given that those are both backpressure related I would categorize them into that system as the cause of muting. Of course given this is the tutorial I am trying to toe that line of just enough information at one time to be understood. Not suggesting it yet, but I would almost "hide" those backpressure states for now and put all three: muted, under pressure, and overloaded together in a backpressure-related chapter/section/FAQ/Appendix/etc.
I will progress with the "Runtime Options" section.
@rhagenson well, apparently we are using "Dead" somewhere. I never knew that.
As for "overloaded" and "under pressure" I think given that those are both backpressure related I would categorize them into that system as the cause of muting. Of course given this is the tutorial I am trying to toe that line of just enough information at one time to be understood. Not suggesting it yet, but I would almost "hide" those backpressure states for now and put all three: muted, under pressure, and overloaded together in a backpressure-related chapter/section/FAQ/Appendix/etc.
agreed.
@rhagenson I think that definition of dead is not quite right.
To be dead an actor also can't be registered with the asio system to receive events.
Or current definitions don't really take that into account.
Perhaps
Alive/Dead would be a good distinction
Alive: Has messages its queue or can receive messages (this includes ASIO events) Dead: Has no message in its queue nor can it receive messages.
Running or Scheduled or Executing/Blocked/Waiting/Muted
Where "Running or Scheduled or Executing" is "currently processing its message queue" Blocked is as you said Waiting is "in the run queue for a scheduler with messages to process" Muted is "not in a run queue for a scheduler. may or may not have messages in its queue"
An Alive actor can be Running, Blocked, Waiting or Muted. A Dead actor can only be Blocked or Muted. (Although I'm not sure if the current implementation would consider a muted actor to be able to be collected by the GC- I would have to see what I did when I implemented muted).
@SeanTAllen My response below just grew and grew here so a lot to respond to here.
First, to be sure there was no typo, did you mean to say Alive is having messages or the ability to receive messages rather than having no messages?
So to summarize my understanding, it would shake down as (borrowing <: "subtype of" notation):
Running <: Alive
Waiting <: Alive
Executing <: Alive
Muted <: Alive
Muted <: Dead
Blocked <: Dead
Therefore, Muted is the only subtype that can be applied to either Alive or Dead actors. From your definitions, I merged Scheduled into Waiting as I do not understand the distinction between "waiting for scheduling" and "being scheduled" (latter of which I assume places an actor as Running). I make the distinction between Running and Executing as loosely related to semantic "in a behavior" (Executing) and "has control of a scheduler thread" (Running) -- a Executing <: Running might then still be technically correct.
Running and Scheduled are not GCed. Blocked and Muted are grounds for GC, Dead is GCed as soon as possible. A backpressure transition due to overload places the actor into Waiting. (I want to say backpressure "kills" actors, but that is not the case given the Alive/Dead supertype names we are using here.)
Anything in here that I missed or is not consistent with your view?
@rhagenson yes, Alive should have been "has messages in its queue". I've edited accordingly.
Running and Executing are the same thing.
Blocked is applicable to both Alive and Dead. Either can be blocked. But a blocked actor can be alive in that it can receive messages still.
Running <: Alive
Waiting <: Alive
Muted <: Alive
Muted <: Dead
Blocked <: Alive
Blocked <: Dead
Got it. Consistent on the view of GCing as well? Dead actors are GCed, while Alive actors in Blocked/Muted are possibly GCed?
Only Dead actors can be GCed. Alive means they can't be GCed because they are still capable of receiving messages.
Dead - can be GCed Alive - can not be GCed
I had a rebuttal based on backpressure along with what I had written so far in the chapter for quiescence, however after reading what I wrote again along with the hierarchy here it all agrees that an actor must be Dead to be GCed. All Alive states are some form of the actor still being active so whether it is actively Waiting due to backpressure or not that will not result in GC to reallocate resources from the cooperative scheduler.
Thank you for helping me clarify these states!