haxe icon indicating copy to clipboard operation
haxe copied to clipboard

Asynchronous IO API

Open RealyUniqueName opened this issue 5 years ago • 80 comments

Lower-level async API. Take 1. I tried to stay close to C API.

Please, review thoroughly. Especially the socket-related API, because I don't have much experience with sockets. Also, your thoughts on TODOs are very appreciated.

While I tried to define the behavior in doc blocks I expect things to change during the implementation journey for the sake of cross-platform consistency.

filesystem

  • [x] php
  • [x] eval
  • [x] jvm
  • [x] java
  • [ ] cs
    • [x] base implementation
    • [ ] unix/windows differences (see TODO comments)
  • [x] hl (depends on #10342)
  • [ ] cpp (depends on #10406)
  • [x] lua
  • [x] python
  • [ ] (?) hxnodejs

net TODO

processes TODO

RealyUniqueName avatar Jan 29 '20 10:01 RealyUniqueName

The docs are a bit wonky in places, but this can be cleaned up later.

Aurel300 avatar Jan 29 '20 11:01 Aurel300

I don't like much asyncio as toplevel package. I thought we would be using asys toplevel package that would somewhat mimic the sys one but with async variants and upgraded API.

If you need a package specific for low level classes, it should be something like asys.nativeio or something similar.

ncannasse avatar Jan 29 '20 13:01 ncannasse

@ncannasse I'll move everything to asys.native once the API is settled.

RealyUniqueName avatar Jan 30 '20 14:01 RealyUniqueName

It's amazing to see Haxe implementing an async API!!

I just wonder, what would be the technical implications to not use an async/await terminology instead of callbacks?

porfirioribeiro avatar Feb 08 '20 21:02 porfirioribeiro

I just wonder, what would be the technical implications to not use an async/await terminology instead of callbacks?

Haxe will be getting both asys (an API similar to this PR) and coroutines in the future. Coroutines are a big undertaking because they have to be integrated properly into the language. It is not just a case of adding async and await as keywords (which would also be a breaking change).

And, once we do have asys (with callback-style API) and coroutines, the two should be easy to join. It may be that our coroutines will be designed in a way that callback-style API is directly compatible with an asynchronous call.

Aurel300 avatar Feb 08 '20 22:02 Aurel300

This goes to 4.2, according to our plans :)

nadako avatar Apr 05 '20 20:04 nadako

Sorry for intervention, but i'm worried you all missing some important points. This API does not seem to account for multithreading environments in any way. It's a straight copy of node js, which does not have that problem by design. Unfortunatly, most of the "sys" platforms support threading. Therefore i have some questions:

  • What mechanism will be responsible for invoking callbacks in long-lived application? Typically there has to be a message loop of some sort in order to perform this task.
  • If there will be a message loop, how to start and stop this message loop in background thread, so any Async API calls return on that background thread?
  • Will there be a mechanism allowing callbacks to be processed periodically in background theread, without waiting, so that background thread can continue to perform iterative task?
  • If all callbacks will be invoked strictly on only one "main" thread, can such thread be assigned manually? In some applications, startup thread is not suitable for such work. For example, it is the only thread capable of performing 3d rendering reliably, so background threads will be more suitable for loading, parsing and properly structuring the external data.

EntryPoint/MainLoop classes seem to be a solution only for "one thread to call them all" case. As far as i can tell these classes are not designed for use with callbacks in background threads. It'd be nice to have an equivalent of SyncronizationContext from C# or Looper from android api.

ntrf avatar May 10 '20 11:05 ntrf

just to mention that deno 1.0 has been released https://deno.land/v1 "design mistakes" in Node as stated by the author, and their correspondence in current version, might be worth checking.

farteryhr avatar May 14 '20 14:05 farteryhr

Sorry for intervention, but i'm worried you all missing some important points. This API does not seem to account for multithreading environments in any way. <...>

The idea is that callbacks get executed in the same thread they were created in. That's supposed to be the common denominator of all of our (very different) targets. What happens behind the API is target implementation details, which can be specified later should the need arise.

RealyUniqueName avatar Jul 06 '20 17:07 RealyUniqueName

As a question: how does this pull request relate to https://github.com/HaxeFoundation/haxe/pull/8832/

Basically, it looks to me like both pull requests take very similar approaches. (The current pull request, however, already implements the targets themselves).

lublak avatar Apr 22 '21 06:04 lublak

That PR was the first attempt to design the API. It's based on signals. This PR is lower level and CPS-based. It allows to design a signal-based or coroutine-based or whatever API on top of it. This PR is supposed to get into the std lib eventually.

RealyUniqueName avatar Apr 22 '21 08:04 RealyUniqueName

That PR was the first attempt to design the API. It's based on signals.

That's not entirely true. E.g. the basic filesystem class is callback based, just like here. Signals are mostly there for when a "callback" can actually be invoked multiple times. An example of this is the FileWatcher.

Aurel300 avatar Apr 22 '21 09:04 Aurel300

Thank you both for the explanation :) And I really hope it goes into the std lib.

lublak avatar Apr 22 '21 11:04 lublak

I just tried reviving this PR, but something is quite wrong... we might need a new one.

Simn avatar Apr 11 '22 15:04 Simn

@simn Is it not just the conflict in tests/runci/targets/Php.hx that must be resolved (e.g. by merge) before the CI can run?

Aurel300 avatar Apr 12 '22 11:04 Aurel300

This conflict does not exist in my local checkout, so I don't know what's going on.

Simn avatar Apr 12 '22 11:04 Simn

Well...

Simn avatar Apr 12 '22 13:04 Simn

Has to be remerged as well.

tobil4sk avatar Apr 12 '22 13:04 tobil4sk

I'm still very confused but this seems to have done the trick indeed, thanks!

Simn avatar Apr 12 '22 13:04 Simn

It's just because the file was changed in development since the first merge, so it just had to be remerged again.

tobil4sk avatar Apr 12 '22 13:04 tobil4sk

Oooh the sqlite PR, right... Talk about unfortunate timing, heh. Thanks for solving that mystery!

Simn avatar Apr 12 '22 13:04 Simn

I'd like to help out with this in some capacity. @RealyUniqueName @Simn and anyone else who's working on it, do you have Ko-fi or any way I can donate money to say thanks?

NQNStudios avatar Sep 06 '22 18:09 NQNStudios

I'd be willing to work on the hxcpp implementation. A few months ago I made my own libuv powered asio library and EventLoop replacement for hxcpp (https://github.com/Aidan63/hxcpp_luv_io/), only covers the use cases I need but works well (no memory leaks, plays nicely with the GC, etc).

Before that I looked at the current hxcpp asys hxcpp draft and it seemed pretty early on which is why I opted to start mine from scratch as I only needed some functionality to start with.

Aidan63 avatar Nov 19 '22 11:11 Aidan63

There are libuv bindings in my fork, which AFAIR should be either complete or very close to be complete: https://github.com/HaxeFoundation/haxe/compare/development...RealyUniqueName:haxe:cpp/libuv

RealyUniqueName avatar Nov 19 '22 20:11 RealyUniqueName

Oh, I even made a PR https://github.com/HaxeFoundation/haxe/pull/10406

RealyUniqueName avatar Nov 19 '22 20:11 RealyUniqueName

I'm almost sure I did implement Asynchronous IO API for cpp on top of that PR, but I can't find it now...

RealyUniqueName avatar Nov 19 '22 20:11 RealyUniqueName

Yeah, I used the buildXml from your cpp PR for my lib. I'm certain you started on the asys cpp implementation as I'm pretty sure I used your EventLoop as the basis for mine. I also remember seeing some todo comments about finding a better way to prevent the GC from collecting callbacks, which is why I moved more of my library into c++ rather than externs as I could manually root and unroot objects to the GC, so it must be around somewhere....

Aidan63 avatar Nov 21 '22 12:11 Aidan63

I've got a draft implementation with the loop and file APIs done and will plod away at the rest of it (https://github.com/HaxeFoundation/hxcpp/pull/1022). I've put the haxe side of things in this repo for now (https://github.com/Aidan63/hxcpp_asys).

Aidan63 avatar Dec 23 '22 20:12 Aidan63

I've implemented all the file, directory, and file system apis in my hxcpp pr and I've started on the net and process stuff. Here's a list of various things I've noted down while implementing it.

  • Should the file class implement the stream api? (https://github.com/Aidan63/hxcpp_asys/issues/1)
  • The read and write functions take in an array, offset, and length, would it make more sense to use the ArrayBufferView type which provides a wrapper around that? (https://github.com/Aidan63/hxcpp_asys/issues/2)
  • Are we requiring any safety on the buffers passed in for writing? I.e. what should happen if the user adds data to the buffer after sending it to write? Originally I copied the entire array into a c++ vector and sent that to libuv which avoids this issue, but for large buffers this uses up a lot of memory. Currently I write buffers in chunks, copying a maximum of numeric_limits<uint16_t>::max of the buffer into a c++ vector at a time and writing that, so users modifying the original haxe bytes after sending could cause issues (https://github.com/Aidan63/hxcpp_asys/issues/3).
  • What shoud happen if the user tries to do something like call a read or write on an opened file on a thread other than the one it was opened on? I could store the original thread and check that on each call and error if its a different thread, or I could shuffle the request onto the original thread and the callback response onto the calling one (https://github.com/Aidan63/hxcpp_asys/issues/4).
  • Most of the fields in the stat buffer are uint64s but the haxe typedef has them as ints, are we living with the loss of precision or should the typedef be updated to use haxe.Int64? (https://github.com/Aidan63/hxcpp_asys/issues/5)
  • On the topic of 64bit I see there's a big buffer type which is unused. Is / was it planned to have support for this type on files and streams? (https://github.com/Aidan63/hxcpp_asys/issues/6)
  • There is now a FilePath type which is an abstract over a string, but we also have a Path class. Is there some way to unify these two to avoid confusion? I understand wanting an abstract over a string vs a class but having both of these looks a bit odd. (https://github.com/Aidan63/hxcpp_asys/issues/7)
  • I think having some sort of "shell execute" option for the process api which means the process is launched through a shell would be good. Dotnet has this and I have this option in my custom libuv io library as it makes launching programs which are in the users PATH much easier. (https://github.com/Aidan63/hxcpp_asys/issues/8)
  • Whats the idiomatic way of using the two callback arguments and are we saying the non exception arg should be null if an error has occured? Most of the time I've been using an if / switch to check if the exception is null, but this seems to open up the posibility that the user could ignore the exception and go straight for the data (which isn't marked as null) and run into a null error even with null safety enabled. I think I saw some comments about wanting to add some compiler magic for the callback type, but would it be easier to just add a haxe.ds.Result type? It avoids any ambiguity around null-ness and isn't going to be forcing the user to write more code as they should (assumably) be checking if the exception is null anyway. (https://github.com/Aidan63/hxcpp_asys/issues/9)

Aidan63 avatar Jan 16 '23 21:01 Aidan63

I would like to reiterate my concerns about the design of the FileInfo api. Half the fields don't really make sense on Windows, and iirc on some targets (mainly C# I think) we cannot get some of that info anyway and thus would have to return nonsense even on Unix platforms.

Personally I think the asys api should stick to what can be implemented consistently on every platform/target, and that Unix-specific apis should be in some kind of extension class so that it is clear to the user that they are not portable.

Other thoughts: everyone is so excited about asys that we seem to be forgetting about sys. It might be good to improve sys too, and build the asys api on top of that?

Most of the fields in the stat buffer are uint64s but the haxe typedef has them as ints, are we living with the loss of precision or should the typedef be updated to use haxe.Int64?

The loss of precision will cause the timestamps to overflow in ~15 years. With Int64 we'll have ~230 years before running into trouble.

Apprentice-Alchemist avatar Jan 17 '23 07:01 Apprentice-Alchemist