language
language copied to clipboard
Running code in an isolate group rather than isolate
During the recent meeting we have discussed progress of shared native memory multithreading prototype. One of the topics of contention was behavior of the Dart code which executes within an isolate group but outside any isolate.
As currently described in the proposal and implemented in the prototype an attempt to access static state not marked as shared by the code which is running outside of the isolate throws an exception. For example:
int i = 0;
void foo() {
i++; // (*)
}
void main() {
Isolate.runShared(() {
foo(); // Throws AccessError at (*)
});
}
Language team raised a concern about this behavior - we will use this issue to discuss arguments for and against this behavior.
This behavior was chosen in the original proposal because in our (implementors opinion) it makes behavior of callbacks entering Dart from native on an arbitrary thread cleaner. The only other viable option is to create a temporary isolate which would exist for the duration of the execution and is destroyed after the call.
- Creating temporary isolate is fast but not free: baseline cost is measured in tenths of microseconds, plus additional cost of initializing statics which are accessed by the callback itself. Such initialization will also generate garbage as isolate is destroyed making this approach unsuitable for any sort of high-frequency callbacks.
- State silently resetting between callback invocations can lead to hard to understand bugs - as opposed to developer immediately hitting a runtime error.
If users want to use code that requires isolated static state - they can always manage that explicitly, e.g. something along the line of
void callback() { // invoked inside a group, but not specific isolate
final isolate = Isolate.create(); // Can even cache this if needed.
try {
isolate.run(() {
// use static isolate state.
});
} finally {
isolate.destroy();
}
}
Essentially we are giving developers a way to write code which they currently need to write in C/C++ using embedding API.
cc @lrhn @leafpetersen
cc @aam @mkustermann
My main issue with the "code crashes if you touch non-shared static variables" is that it's an invisible and uncontrollable limit on what you can do inside such a not-real-isolate.
Unless the code you run is completely authored by yourself, which it never is because you will use platform libraries, there is no way to predict whether code will crash. You can run it and see, and hope that it doesn't only update its static cache on Thursdays.
If you choose to rely on code you haven't written, your code can break in the very next update, if that third-party code decides to access a non-shared static variable. Accessing static variables is not considered a breaking change, so nothing will prevent others from doing it. And accessing a new static variable must not be considered a breaking change. That's far too onerous a restriction. (I'll personally be happy to close any issue on core packages for not accessing static variables with a "No thanks".)
That is: The vast majority of code, and all code written so far, has no incentive to make itself compatible with shared isolates. It's tricky, it may not be possible while retaining functionality, and it may be costly (I assume a shared variable will be protected by locks.)
The code you can use in a shared isolate will be very limited. That doesn't worry my as much as it being fragile, if you don't even know that the code you have tested will keep working. You might be incentivized to pin yourself to specific version numbers of things you depend on, because those are known to work, and a patch-level increment of a package might stop working.
How isolategroup-shared callbacks restriction on non-shared static field access is different from, for example, restriction on what can be sent in a message to an isolate? Some class from some third-party package didn't use to have some non-sendable state in it, now it does, so your code that used to send an instance of that class will now throw and have to be changed. That seems to be acceptable.
Basically, isolategroup-shared callbacks have limitations, but they are very useful for integration with native libraries(remove a need to write/build native code). If errors are clear and actionable(send message to an isolate that would run the code that accesses non-shared static field), why rob users of this ability?
@mraleph I think I could get comfortable with this if it was made more explicit that this is not intended to be a way to run arbitrary Dart code. So something like:
- Change the name from
Isolate.runSharedto something likeIsolate.unsafeRunWithoutStatewith the emphasis on having the wordunsafein it (much like Haskell'sunsafePerformIO). - Explicitly call out in the documentation that it is not recommended to run any code that you do not control within that isolate (including core library code)
- Explicitly call out in the documentation that any code that you run which you do not control within that isolate might suddenly stop working at any point, and that this is explicitly not part of the breaking change contract of any code (including the core libraries)
- Explicitly include in our breaking change documentation that changes to the core libraries which cause code running in unsafe shared isolates to break are outside of the scope of our breaking change policy.
@sigmundch This feels connected to things you've been exploring around limiting what code dynamic modules can access.
Here's some of my thoughts:
We want to use shared memory multithreading support for callbacks from C, especially for synchronous callbacks where C calls Dart and expects a synchronous return value. This may require the Dart code to do arbitrary work - sometimes it may do very little work, sometimes it may do much more work.
I'd argue that those callbacks should be able to use RegExp, Uri.parse(), protobuf decoding, print(), ... - all of which would throw-at-access atm because they rely on global fields. So we need to to have a solution for this.
If we created a fresh static field state for each callback invocation, then
- each invocation of the callback may require
O(<size-of-program>)memory state to be allocated & default-initialized - plus the time it takes to initialize all these global fields that are actually accessed - the
print()function (and some other things) would still not work, because those are initialized by the embedder on normal isolate creation, so we'd need to add even more overhead in terms of allocation & initialization
For high-frequency C->Dart calls this is just way too much overhead. e.g. Imagine C calls Dart and passes a simple protobuf message describing some state and Dart should return a boolean whether proceed or not. Now the overhead we'd be adding is enormous: We'd maybe allocate xx KB of memory for static field state & default initialize it, lazy-initialize various static protobuf metadata fields on-first-access and finally decode a few bytes of proto message and return a boolean.
=> IMHO We should have a solution where the callbacks perform only the work they actually need to perform - no O(<program-size>) extra memory overhead and no O(<program initialization>) time.
So to make this more performant I'm convinced that we want to make any global fields (corelib & user-defined static fields) that are on the hot path to be initialized only once and ready-to-use for all callback invocations. We have the new shared fields for that. Whatever those fields on the hot path contain has to be sharable (and we should have a discussion whether that can be mutable dart objects as well - possibly via opting in individual classes, ...)
Once we have this "run a dart closure with zero extra overhead" concurrently with normal dart isolate, we have a powerful mechanism. We're going to use that mechanism then also to parallelize programs: Create N threads, do some work in parallel and join those N threads (similar to using lightweight isolates today -- but now avoiding the extra O(<program-size>) memory and O(<program initialization>) time as well as more sharing of state)
Coming back to the original issue: I can see the point that users will have a hard time to predict what packages, libraries or core library features they can use in C callbacks - or whether they may get exceptions -- and how they may silently get broken with a pub upgrade. We cannot avoid this problem entirely as code that triggers microtasks, timers, anything async will fail - as there's no event loop running after the callback returns to C.
What I could imagine is that we don't throw on normal global field access but
- make the cost
O(<number of fields accessed>)memory as opposed toO(<number of fields accessed>)memory - have 0 eager work done by embedder (e.g. installing
printclosure)
Currently the VM represents the static fields of a program as a large array indexed by a field id - this is very fast to access. An isolate creation involves a copy of that large array with default values & sentinels. Though one could make shared isolates use a different representation that can grow as the number of fields accessed grows (e.g. hashmap). That would make us only pay for what's actually used (and a little code size cost and a little access time cost for normal isolates accessing normal static fields - a "am I shared isolate" bit test and branch)
All the state we currently initialize by embedder on isolate creation (e.g. print closure, etc) will be migrated to use shared fields.
That would make us end up in a place where the overhead of invoking a callback is purely based on what that callback needs (no extra embedder setup, no setup for fields that aren't used, ...) while still not throwing when accessing normal static fields. When profiling, one will observe slowness due to using non-shared field initializers and start migrating them to be shared fields.
So to make this more performant I'm convinced that we want to make any global fields (corelib & user-defined static fields) that are on the hot path to be initialized only once and ready-to-use for all callback invocations. We have the new shared fields for that. Whatever those fields on the hot path contain has to be sharable (and we should have a discussion whether that can be mutable dart objects as well - possibly via opting in individual classes, ...)
The problem is then that some static variables are mutable and not shareable.
What if we had a special "initialize-once" shared variable, which does have per-isolate state, but it is guaranteed to be able to share the initialization value, so if that requires complicated computation, it'll still only happen once. Then the resulting value is cached and the next initialization is just "read value from isolate-shared cache".
If we used a SharedVariable class to represent shared variables, and maybe an IsolateVariable to represent per-isolate state that is accessible in a shared isolate, then that could be:
const _uriParserTable = IsolateVariable.late(_createParserTables);
which would invoke that function the first time it's read, store the value in a real shared variable (or a global authority array of initialization values), then initialize the isolate variable with that value.
You will need O(#sharedVariables) shared space, O(#isolateVariables) per-isolate state initialized, which can be copied from a global authority array for each shared isolate creation, or initialize to null and copy on first use, but you won't be able to access arbitrary static variables that are not declared as IsolateVariable.
Unless everybody makes their static variables into IsolateVariables, it's less of an issue than accessing all static variables.
as code that triggers microtasks, timers, anything async will fail - as there's no event loop running after the callback returns to C.
And it won't even get that far, since it tries to read the Zone._current static variable before doing anything asynchronous. That variable cannot be shared, it must not be mutated concurrently, and must have the correct value for each concurrent thread. It is per-exection thread, which currently means per-isolate.
(And for the record: Uri.parse has been fixed, it now uses a const String table.)
@leafpetersen
@mraleph I think I could get comfortable with this if it was made more explicit that this is not intended to be a way to run arbitrary Dart code. So something like:
What if there is no Isolate.runShared only NativeCallable.shared (or some other name, e.g. isolateGroupLocal or NativeCallable.withoutIsolate or similar) which creates a native callback which is invoked in an isolate group rather than an isolate? Of course we would then document all the restrictions on what it means to run without isolate state.
I would really like to avoid unsafe as a prefix. It does not communicate what is actually unsafe about the function - so it does not add any value except looking dangerous.
We already have NativeCallable.isolateLocal which just crashes the process when invoked on a wrong thread - but we did not prefix that with unsafe. Similarly SendPort.send(o) throws an exception if o transitively references something unsendable but it is not called SendPort.unsafeSend(o).
It does not communicate what is actually unsafe about the function - so it does not add any value except looking dangerous.
Looking dangerous is the value. Seriously. runShared communicates nothing about the function either (what does "shared" mean anyway? it is no more shared than any other isolate is), but it specifically does not communicate that the isolate that you are about to create has non-standard and dangerous semantics. I'm not particularly set on the specifics of the naming, but the whole point is that you should not be able to pick this thing up without it being immediately obvious that you are in danger of losing a toe.
NativeCallable.shared
Something like this is probably reasonable. In general I think that going through a native interface is a good signal that you're holding something sharp. I still don't understand the "shared" terminology.
Similarly
SendPort.send(o)throws an exception ifotransitively references something unsendable but it is not calledSendPort.unsafeSend(o).
Perhaps it should? But SendPort only has one API, so you can't accidentally stumble onto the wrong one. If there were SendPort.send which worked in all cases and SendPort.sendShared which blew up occasionally I think I would have the same concerns.
Another possible approach might be to just add an optional parameter to the existing Isolate.run? Something like {..., bool throwOnStaticAccess = false}?
NativeCallable.shared
I still don't understand the "shared" terminology.
So far the name for the ffi callback that we've been using is NativeCallable.isolateGroupShared. Meaning that it's the isolate_group state that is shared - such ffi callback has access only to that. The states of all isolates remains private, isolated, not shared, not accessible to such ffi callbacks.
Meaning that it's the isolate_group state that is shared
But that state is shared with all of the other isolates as well. That is, the distinguishing feature of the isolate you get from calling this api is not that it has access to shared state - all of the isolates in the group have access to that state. The distinguishing feature is that it does not have access to any other state of its own.
Here's my thoughts for what they are worth...
Semantics
I agree with Lasse that a kind of code that crashes at runtime if it happens to touch any static fields, even those encapsulated by arbitrarily distant third-party dependencies, is going to be extremely unfun to write and maintain.
I definitely do not think we should tell package maintainers that using any static state should be considered a breaking API change. Instead, people writing this kind of native interop code should bear the burden of dealing with this brittleness.
Thinking about it, this style of programming reminds me of writing interrupt handlers. In some sense, these isolates are interrupt handlers. Native code is calling into the Dart code and we want to reply as fast as possible with as little set up work as we can get away with, and we're willing to accept that that code is running in a very bizarre, incomplete shadow world. If you touch anything wrong, you get shocked. Just try to return back as quickly as you can before you hurt yourself.
This doesn't sound like a fun user experience. I mean, I've never even written an interrupt handler myself and I still know the horror stories of how difficult they are to maintain.
At the same time... there is a reason they were created. If we think we have a similar compelling use case for Dart, it sounds reasonable to me. We'll just have to telegraph very clearly to anyone writing these things that they have to be very careful about what code they use.
We may want to document in the core libraries which APIs are "interrupt safe" (with a better name, obviously).
Names
Names matter. Correct me where I'm wrong, but here's my understanding of the terminology:
-
"Isolate": When talking about "isolate" in general, we include
spawnUri()isolates where each has its own copy of its code and static state. Different isolates may be running entirely different code. Because of that, the only objects that can be passed between them are a small list of known-safe core library types. These are sort of like spawning processes and communicating over pipes. -
Isolates in an "isolate group" all run the exact same code. As such, they are able to send instances of objects between them. Many but not all kinds of objects can be sent. Each isolate in the group still has its own copy of any static state. I'm guessing that in most real-world use of isolates today, they are all in the same group?
-
A proposed "shared field" is a static variable whose value is shared across all isolates in the group. If one mutates its value, the others see the change. Like the opposite of "thread local".
-
A proposed "shared isolate" runs in the context of an isolate group. It can access and modify shared fields in that group but crashes at runtime if it tries to access any unshared (i.e. normal) static fields.
I think "shared field" works. (Though it sort of undermines the original name "isolate". But then that was never a great name since it describes what the thing can't do instead of what it can.)
I find "shared isolate" confusing. The point isn't that it can access shared state. All isolates can do that. The point is that it can't access unshared state. It doesn't have any of its own static state. It's fast because it doesn't have the overhead of spinning up its own static state.
In other words, the reason we're taking away the capability to have its own static state is because running without that overhead is faster. So how about something that emphasizes that. Maybe "bare isolate" or "stateless isolate"?
Overall, I really appreciate all the thought and work that's gone into this proposal. Concurrency isn't my wheelhouse but it seems like it's navigating a very difficult constrained problem space well.