language icon indicating copy to clipboard operation
language copied to clipboard

It is necessary to implement threads

Open insinfo opened this issue 6 years ago • 184 comments
trafficstars

It is necessary to implement threads, because in the majority of platforms that run Dart supports thread, such as Android, iOS, Windows, Linux, MacOS, * WebAssembly etc.

Many use cases where thread can be used easily and quickly to speed up the processing of tasks. I use enough threads in Java Android / Linux and C #.

Threads can communicate with each other more easily than processes.

Isolated objects do not share compiled code or JavaScript objects and therefore can not share changeable data as pthreads.

Although the Isolate can provide concurrent programming behavior, it does not have the advantages of the threads, because with the threads there is shared memory and with the Isolate there is not. There are many advantages to threads over multiple processes such as lower memory consumption, another advantage of using a thread-based model is that it is fast, threads are lighter than Processes. Another disadvantage of using a process-based model is that it will be slower. You will have to copy data between the competing parts of your program.

This does not mean that you have to abandon the Isolated Processes, but instead add another option to the most common cases where you want to accelerate an algorithm that can be easily and quickly paralleled with Threads

  • It would be nice if Dart already had an AOT compiler or a VM for WebAssembly
  • WebAssembly threads ready to experiment

insinfo avatar Apr 27 '19 21:04 insinfo

The primary disadvantage of shared-memory concurrency is that you need to maintain a consistent memory across threads, avoid concurrent modifications and access to uninitialized or partially modified objects and provide concurrent allocation.

A secondary issue is that concurrency is not supported when compiling to JavaScript.

It would be nice to have some version of concurrency that is easier to use than isolates, even if it's only shared access to immutable memory or explicitly opt-in shareable memory. That's still a tough restriction for a language like Dart which has globally accessible mutable variables.

lrhn avatar Apr 29 '19 09:04 lrhn

@lrhn

A secondary issue is that concurrency is not supported when compiling to JavaScript.

But isn't that what service workers should be capable of?

MarvinHannott avatar May 08 '19 17:05 MarvinHannott

I can't agree more with @insinfo . I think the reason why threads where not implemented into Dart in the first place is because Dart was mainly focused into compiling to JavaScript (that don't support threads natively). But things have changed and now, most of the developpers are focusing the dartvm (via Flutter or directly) so the lack of proper thread support in Dart has really became an issue. As a developper, working with Isolates is not only painful, it is also highly ineffective when you have to copy (serialise/deserialise) a large collection of complex objects between isolates (for example if you have to perform a filtering/computation on a large collection of objects into an Isolate). So I think that having a proper support of threads in Dart should be a priority of the dart-lang team as, unfortunately, unlike other features currently missing in Dart (enums with value storage, protected variables and methods visibility...) there is no alternative that we can develop, as a developer, to replace this lack of support of threads in Dart (Isolates are probably one of the things Flutter developpers are struggling the most with).

@lrhn

The primary disadvantage of shared-memory concurrency is that you need to maintain a consistent memory across threads, avoid concurrent modifications and access to uninitialized or partially modified objects and provide concurrent allocation.

This is the case for all languages supporting threads (=pretty much all modern languages) so it shouldn't be that much of an issue. Anyway all devices are now multicores so offering the possibility to share memory between process seems essential for any language turned to be still usefull and used in the future.

A secondary issue is that concurrency is not supported when compiling to JavaScript.

To support threads into dartjs, you should probably take an approach similar to the one took by the TeaVM team to compile Java bytecode to JavaScript. They fully support the conversion of Java (and Kotlin and Scala and any language that can compile into bytecode to the JVM) Threads into Javascript by transforming them into green threads (http://teavm.org/docs/intro/overview.html)

I really hope that Dart will offer a full support of threads soon (for dartvm first, it can come to dartjs latter) as this is a feature that is essential to really develop powerfull apps.

ramsestom avatar Nov 20 '19 18:11 ramsestom

@ramsestom

As a developper, working with Isolates is not only painful, it is also highly ineffective when you have to copy (serialise/deserialise) a large collection of complex objects between isolates (for example if you have to perform a filtering/computation on a large collection of objects into an Isolate).

Most of the time copying is really really cheap compared to the actual computation and shared memory isn't worth the headache. Also don't think thread synchronization comes for free! In fact, copying might not be any slower. There is also TransferableTypedData to move TypedData from one Isolate to another. Of course that isn't true for any other data structure since there is no copy constructor, so the Dart runtime wouldn't know how to properly copy a class instance (at least if it isn't a constant).

But if you truly need shared memory then you could use a pointer from dart:ffi. This might be a dirty trick but it absolutely works and you only have to copy the pointer address. Naturally it brings back all the problems with synchronization and you have to free the pointer manually. And in most cases it won't increase performance at all (rather the opposite).

The design of Isolates isn't ineffective at all; in fact from a gc's point of view it is very effective. It also makes other things like certain compiler optimizations and memory management simpler. And it practically erradicates race conditions. Many programming languages like Erlang or Go don't share memory and pass messages, and obvioulsy they do fine. I also don't think the async features of Dart would play nice with thread synchronization. And (native system) threads wouldn't play nice with Dart's Observatory which is a very powerful tool.

I would also be interested why you think working with Isolates is more painful than working with threads in any other language. For all practical purposes Isolates can be viewed as threads. The API i just as "low level" as it has to be to satisfy all use cases. But there are higher level abstractions like the pub package isolate which make dealing with Isolates much easier.

The primary disadvantage of shared-memory concurrency is that you need to maintain a consistent memory across threads, avoid concurrent modifications and access to uninitialized or partially modified objects and provide concurrent allocation.

This is the case for all languages supporting threads (=pretty much all modern languages) so it shouldn't be that much of an issue. Anyway all devices are now multicores so offering the possibility to share memory between process seems essential for any language turned to be still usefull and used in the future.

I would claim this is evidently not true. There are many languages which don't share memory and CSP libraries for C++ and Java are very popular.

MarvinHannott avatar Nov 20 '19 20:11 MarvinHannott

Most of the time copying is really really cheap compared to the actual computation and shared memory isn't worth the headache. Also don't think thread synchronization comes for free! In fact, copying might not be any slower.

I didn't say that synchronisation comes from free but copying is not cheap. In terms of computation time, it isn't cheap when you have large data to copy (parsing a large json file and returning the result, decoding an image...) and even if the copy time might be much smaller than the computation time taken by the whole Isolate process, copying the data back to the main Isolate (=the IDE if you are on a Flutter app) might take enaugh time to have this IDE Isolate to hang on and be noticed by the end user. And in terms of memory, copying is not cheap at all as you add another copy of your data for each new Isolate. You don't have these issues with Threads that can share memory.

But if you truly need shared memory then you could use a pointer from dart:ffi. This might be a dirty trick but it absolutely works and you only have to copy the pointer address. Naturally it brings back all the problems with synchronization and you have to free the pointer manually. And in most cases it won't increase performance at all (rather the opposite).

Do you have any example on how to use dart:ffi to share pointers between Isolates? I would be interested even if it is a dirty hack. Anyway, if any Isolate can access any data produced by other Isolates given a pointer to this data in memory, it means that Isolates already have shared memory (it is just not synchronised). So having a proper implementation of Threads in Dart would just be a matter of correctly handling synchronized access to data declared as synchronized by the developer. Isn't it? Seems like something the Dart language should offer without having to rely on dirty hacks...

The design of Isolates isn't ineffective at all; in fact from a gc's point of view it is very effective. It also makes other things like certain compiler optimizations and memory management simpler. And it practically erradicates race conditions. Many programming languages like Erlang or Go don't share memory and pass messages, and obvioulsy they do fine. I also don't think the async features of Dart would play nice with thread synchronization. And (native system) threads wouldn't play nice with Dart's Observatory which is a very powerful tool.

Threads doesn't necessarily have to be native. Dart could choose to provide green threads in wich case Threads would work pretty much like Isolates exept that they would offer access to shared memory and ensure synchronised access to part of this memory declared as synchronised. In this case I think that Dart's Observatory would still play nicely with Dart threads. As for the async feature of Dart, languages like Python or Java have shared threads memory but also support async/await or Futures so it shouldn't be an issue.

I would also be interested why you think working with Isolates is more painful than working with threads in any other language. For all practical purposes Isolates can be viewed as threads. The API i just as "low level" as it has to be to satisfy all use cases. But there are higher level abstractions like the pub package isolate which make dealing with Isolates much easier.

When working with Isolates you not only have to create the send/receive ports each time you want two Isolates to be able to communicate but you also have to ensure that each data structure you want to send through the port is serialisable/deserialisable and to write a routine in the sender on how to find and send the data requested by the recever. Things can even turn a lot more complicated if you want to launch a new Isolate from another Isolate and have to broadcast data from one Isolate to another through an intermediate Isolate ports. With threads and shared memory everything is a lot more easier, you can have static synchronized repositories for your data and you would be able to access them from any Thread without the headache of asking the spawning Thread the data or passing throw the whole Thread hierarchy in case of cascade Thread calls.

ramsestom avatar Nov 21 '19 00:11 ramsestom

it isn't cheap when you have large data to copy (parsing a large json file and returning the result, decoding an image...) and even if the copy time might be much smaller than the computation time taken by the whole Isolate process, copying the data back to the main Isolate (=the IDE if you are on a Flutter app) might take enaugh time to have this IDE Isolate to hang on and be noticed by the end user.

With TransferableTypedData there should be only one copy. While the child isolate copies the data, the main isolate shouldn't block; after all, ports work asynchronously. In theory the user shouldn't notice anything. If memory is a concern and you parse really really huge data sets then you might want to use C/C++/Rust, either through ffi or native extension. Native extensions can work asynchronously as well. Might worth a look.

Do you have any example on how to use dart:ffi to share pointers between Isolates?

// main isolate
// [allocate] is in package ffi, not dart:ffi
final ptr = ffi.allocate<ffi.Uint8>();
port.send(ptr.address);
...
ffi.free(ptr);

// child isolate
port.listen((int address){
  final ptr = ffi.Pointer.fromAddress(address);
});

This pointer points to shared heap memory not managed by the dart runtime but by the operating system! And allocating memory and copying list elements into it is really slow. This is a pointer you would pass as a parameter to a C function. If you free the pointer while it is used in another isolate it leads to undefined behaviour. If you don't free it it leads to a memory leak. I wouldn't recommend doing this since it isn't more memory efficient or faster. Better use TransferableTypedData or consider going native in the first place.

As for the async feature of Dart, languages like Python or Java have shared threads memory but also support async/await or Futures so it shouldn't be an issue.

But there are issues. Synchronization involves blocking. Python has the famous Global Interpretation Lock (or GIL) so the reference counter works as expected. Nobody uses threading in Python for that reason alone (except with PyPy which indeed uses green threads). And thread synchronization in Java can hurt asynchronicity as it is the direct opposite. So most web developers avoid it. You were worried about UI becoming unresponsive, yet blocking is the reason for that. Isolates have become much more lightweight than they used to be.

When working with Isolates you not only have to create the send/receive ports each time you want two Isolates to be able to communicate

Well, not if you just use the higher level abstractions offered by the Dart team.

you also have to ensure that each data structure you want to send through the port is serialisable/deserialisable

That might indeed be a pain point with nested data structures. But in most cases the data is serializable. As far as I know the Dart team is working on this.

Things can even turn a lot more complicated if you want to launch a new Isolate from another Isolate and have to broadcast data from one Isolate to another through an intermediate Isolate ports.

To be honest, I never encountered that scenario. And I am not entirely sure I can follow. You mean that two or more child isolates spawned by a parent isolate couldn't communicate with each other? I know that a ReceivePort can be transformed into a broadcast stream with many subscribers. I think it should work across isolates.

MarvinHannott avatar Nov 21 '19 14:11 MarvinHannott

@lrhn About the whole compile to JS thing.

Shared memory multithreading is compatible with JS, because it can compile to single threaded code. It basically won't break anything, it just won't provide any performance benefits on the web.

I don't see why this should be an issue when the web will underperform other Dart runtimes anyhow.

GabrielRatener avatar Nov 28 '19 22:11 GabrielRatener

A feature which performs badly can be worse than no feature at all.

The end result is still guidelines saying "don't use X, it's too slow", so will users not get the benefits of feature X. They'll still pay for the extra implementation overhead anyway, and the mental overhead of knowing, and then ignoring, the feature.

Or, in other words, I'd rather do nothing than do something badly.

lrhn avatar Nov 29 '19 08:11 lrhn

I believe the future is WebAssembly, because soon WebAssembly will have direct access to the browser WEB APIs and will no longer need to compile for javascript.

insinfo avatar Dec 04 '19 22:12 insinfo

@lrhn About the whole compile to JS thing.

Shared memory multithreading is compatible with JS, because it can compile to single threaded code. It basically won't break anything, it just won't provide any performance benefits on the web.

I don't see why this should be an issue when the web will underperform other Dart runtimes anyhow.

That indeed would work, but might not be what the developer wants or expects. If responsiveness / time to compute weren't an issue we wouldn't consider parallelism in the first place. So one could also argue that web workers should be used, though they don't use shared memory.

If we really want to keep JavaScript as a compile target we also ought to have similar semantics. Though to be honest, with TypeScript and other better suited solutions out there, I don't see why we should care about this any longer. Outside of AngularDart no one uses Dart as a compile to JavaScript solution, and even AngularDart is rarely used. Just my humble opinion.

MarvinHannott avatar Dec 05 '19 12:12 MarvinHannott

you also have to ensure that each data structure you want to send through the port is serialisable/deserialisable

That might indeed be a pain point with nested data structures. But in most cases the data is serializable. As far as I know the Dart team is working on this.

A Huge/Mega/Collosal/Epic "pain point", in some cases I/We have to move mountains to overcome this issues that come with mandatory "serialisable" data.

Besides Java for instance has built in Concurrent versions of mostly used DataTypes. I never had any issues about concurrency in Java even though I heavily used it...

Qt has a similiar mechanism with signals/slots but you can transfer any Type as long as you register it AFAIK. No Limitaions...

Dart IS limiting, enforcing developers to its own way of thinking, instead, it should provide me everything and let me worry about ist downsides or intricasies but do not limit me...

ebesirik avatar Feb 17 '20 16:02 ebesirik

@nomercy78 I agree with you

Dart IS limiting, enforcing developers to its own way of thinking

insinfo avatar Feb 18 '20 01:02 insinfo

@nomercy78

A Huge/Mega/Collosal/Epic "pain point", in some cases I/We have to move mountains to overcome this issues that come with mandatory "serialisable" data.

Besides Java for instance has built in Concurrent versions of mostly used DataTypes. I never had any issues about concurrency in Java even though I heavily used it...

https://api.dart.dev/stable/2.7.1/dart-isolate/SendPort/send.html

The content of message can be: primitive values (null, num, bool, double, String), instances of SendPort, and lists and maps whose elements are any of these. List and maps are also allowed to be cyclic.

Most dart:collection types can be easily converted to a List or Map. Converting user defined types can be a hassle, but there are libraries for automatic code generation. I don't see how anybody has to "move mountains". Though I would see the point in a sendable interface to abstract the serialization away from the user.

There have been good arguments, why system threads are not part of the platform, although they evidently work fine. That is not the point. If anybody urgently needs threads, he could pull it of by native extension. Shouldn't be too hard either.

MarvinHannott avatar Feb 18 '20 18:02 MarvinHannott

If dart ever look at implementing thread, they should look at java loom. Some good reading on loom.

pratikpparikh avatar Jun 27 '20 21:06 pratikpparikh

the dart team could at least make an extension or a separate opcional package that implements thread

insinfo avatar Jul 09 '20 19:07 insinfo

Concurrency is not a feature you can just add with a package.

The Dart design is completely single-threaded, and adding support for concurrency would require adding a memory model that can handle concurrency. That's something which Dart doesn't currently have.

I'd go for having a much more efficient implementation of Isolate instead. An Isolate.spawn() operation should be able to be quite efficient.

lrhn avatar Jul 10 '20 08:07 lrhn

the dart team could at least make an extension or a separate opcional package that implements thread

@insinfo I found the following package https://pub.dev/packages/threading.

pratikpparikh avatar Jul 14 '20 17:07 pratikpparikh

Thumbs up for better parralelization options with Dart since the Isolate model is indeed limmiting and does not provide efficient and effective ways to benefit from multi-core CPUs.

It is a trend of the recent 15 years in processor development where increased parallelism (number of cores) is the primary driver of performance growth. A typical smartphone now has 8-core CPU, AMD and Intel offer consumer products with 16 cores. Imagine a single threaded Dart program in 2025 running on a mid-range smartphone using one core out 16 available... Technically there're multiple helper threads in Dart, such as running GC, thoug they are not available to devs and don't help with code parallelism.

Few remarks to the above posts:

  1. People say data get's copied when transfered between isolates. It's not completely true. As long as we don't deal with raw bytes (such as TypedData collections), data gets serialized/deserilized on both ends and it is not as fast as simple byte copy. Even string converstions via Utf8.encode/decode (epsecially if we have non-ASCII strings) can present significant overhead. 99% of their time developers deal with object graphs, not raw bytes, and marshaling those objects between isolates is way more expensive than sharing memory.

  2. Indeed modern hardware makes it possible to ignore the overheads associated with certain programming paradigms, such as parallelism with isolates in Dart (or data immutability in Redux) - extra CPU cycles or bytes of memory won't be noticed in typical use cases. Though I can imaging a few scenarious (actualy real ones I faced with Flutter) where Dart's Isolates can be a dead-end or make devs doing complicated workarounds:

    2.1. In a Flutter app where we want to target 120FPS no single frame (or item from the message queue) can run for more than 8ms. One of clicks in the app triggers an HTTP request, which returns JSON and desrializes it to a list of Dart objects. Just like Flutter docs recommend we use Compute() function to create an isolate, make there an HTTP request, run JSON deserialization and return the result to the main isolate. If JSON happens to contain hundreds of thousands of objects, receiving the list (of Dart objects already deserialized from JSON in the secondary isolate) on the main isolate can easily take tens and hundreds of milliseconds and Flutter's promised high FPS are not achievable in this scenario.

    2.2. If one needs to iterate through a large list of objects and aggregate some value (e.g. compute sum on a few fields), using threads and shared memory can give nearly X times performance boost (where X is the number of cores/threads available). With isolates there're barely any options to efficiently utilize multiple cores. E.g. in .NET there's TPL (Task Parallel Library) which aims to help with that kind of cases via Tasks (which are C# alternative to Futures) running in a thread pool.

  3. Apparently with Dart there's a tradeoff, you either simplify the async/await/concurency model and make it easier to create simple and stable code OR you deal with threads and all the evils associated (mutexes, critical sections, deadlocks and race conditions etc.). And you either have the language easily transpiled to JS (which is single threaded) or you don't target Web.

    3.1. It seems WASM threads are coming to browsers (https://github.com/WebAssembly/threads) and there're requests to add threads to WASM thread implementation to Mono (https://github.com/mono/mono/issues/12453) and Microsoft's Blazor framework (runninig on top of Mono).

maxim-saplin avatar Mar 28 '21 21:03 maxim-saplin

I want to see native threads in Dart. Isolates are very limitating in Dart, espacially since the language is stable for desktop platforms.

There are many reasons why we would want to see native threads :

  • Data sharing between threads in critital operations - where copy or serialize/deserialize can be too slow
  • Ease of writing - I personnaly find it easier / faster to write critical sections with threads than with Isolates. This is because with threads I only have to take care of sensitive data (mutex and locks where read/write operations can be done at the same time by two threads). Isolates makes you write a lot of code, since you have to create a message / port type for each data you want to share, even if it's not critical data.
  • Since it is more permissive, threads allow programmer to be more powerful, that's what we expect from a programming language.
  • ....

Of course, that would probably imply to modify or to adapt memory model of Dart. But AFAIK it seems necessary if Dart wants to conquest desktop application world. That would be the best improvement to the language right now. People who want safe concurrency model could then use Isolates. But for a lot of specific applications, we need those old unsafe threads.

johannphilippe avatar Apr 16 '21 17:04 johannphilippe

@johannphilippe I fully agree

where copy or serialize/deserialize can be too slow

insinfo avatar Aug 02 '21 20:08 insinfo

I'm actually thinking about something probably more fitted to Dart : a SharedMemory class that could be accessed from different isolates :

  • No GC (user responsibility to free this shared memory)
  • Isolates locks
  • Could be shaped like a Map (String keys to access memory blocs)

For example :

SharedMemory mem = SharedMemory();
mem.allocate(String blocName, int numBytes);
mem.lock(String blockName);
mem.unLock(String blocName);
mem.write(String blocName, dynamic data, {int indexOffset = 0});
mem.read(String blocName, int indexOffset, int sizeOfData);
mem.free(String blocName);

That is only a prototype, but such a class could solve a big part of the problem here. I don't even know if that would be possible to create such a class on the user side, since Isolates are so... isolated.

And still, native threads would be better to me.

johannphilippe avatar Aug 03 '21 09:08 johannphilippe

I don't even know if that would be possible to create such a class on the user side, since Isolates are so... isolated.

You can relatively trivially create such class using dart:ffi (or in general: you can easily share data between isolates as long as it lives outside of the Dart heap).

mraleph avatar Aug 03 '21 12:08 mraleph

@johannphilippe

I don't think this is a solution and that it is efficient, there has to be a native thread implementation to be able to share Objects/Structs

insinfo avatar Aug 03 '21 20:08 insinfo

@insinfo That is what I said. But since it doesn't seem to be in the focus of Dart team, I think it's still useful too try a few workarounds.

@mraleph Never tried that way. I'll try it soon.

johannphilippe avatar Aug 04 '21 06:08 johannphilippe

I am screaming for threads too

Xyncgas avatar Sep 23 '21 16:09 Xyncgas

On request of @mraleph and following discussion from #46754 with @gmpassos and @mkustermann , here is a proposal : SendPort.sendPointer(Object). This method would send a pointer to the Object. Object becomes unavailable in the sending Isolate (becomes null ? Implements a particular interface ?).

This would allow the transmission of big, user-defined data across Isolates without serialization, and without Isolate.exit()

The main problem with this approach is of course how to make the Object unavailable.

mtc-jed avatar Mar 04 '22 11:03 mtc-jed

Maybe immutable objects can be shared without remove them from the sender. This will require ways to build immutable objects or at least really immutable collections.

It's an easier approach than check if the object can be really removed from the sender, what will demand a GC pause (I think).

gmpassos avatar Mar 04 '22 11:03 gmpassos

Making objects unavailable in the sending isolate is close to impossible. If there are any non-weak references to the object, those references need to keep pointing to something. Object references preserving identity is fundamental in Object Oriented programming.

Also, it's usually not just one big object containing a lot of integers and strings. A big data object is big because it contains lots of other objects, which would also get sent and made inaccessible in the source. Implementing a special interface, and special casing that, is not sufficient, unless all the other objects in the structure also implement that interface. (Or at least any other object in the structure which has a reference to it from anywhere outside the structure.)

Just knowing whether there is a problem probably requires a special garbage collection step prior to sending, to figure out which references need to be modified, and that means traversing the entire structure anyway. Then you might as well copy it.

Very much non-trivial.

Sending immutable objects sounds much more promising.

lrhn avatar Mar 06 '22 11:03 lrhn

I agree that sending arbitrary "big" objects (a big graph of objects) is a complex task to be performed in an efficient and transparent way for a GC and object oriented language. It will at least demand that a developer creates an objects graph ensuring that there's no one pointing to it (only weak references), what is NOT a real case scenario.

I vote to allow the creation of real immutable collections (List, Map, Set + const instances) that can be used for many real case scenarios and allows sharing of data, not only to return the result of a task in an Isolate, but also improves the bootstrap time of a new task in an Isolate, since the data bottleneck exists in the begin and end of a parallel task.

Thanks for the discussion 👍🏻

gmpassos avatar Mar 06 '22 21:03 gmpassos

I wonder why we're afraid of sharing mutable objects between threads? In other OOP languages it's already possible and to make things thread safe we can use mutex / locking mechanisms to "solve" it.

themisir avatar Mar 07 '22 09:03 themisir