rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Add asynchronous File IO

Open mfelsche opened this issue 6 years ago • 7 comments

This issue tries to spark the discussion around 1. the need for asynchronous file IO, 2. The possible implementations thereof and 3. The new look and feel of such an asynchronous file API for pony. The new asynchronous file IO could be added alongside the existing blocking file io apis.

  1. Current File operations in Pony use standard POSIX file operations like write/writev, read etc. which are all possibly blocking. This means that on performing such an operation on a file, one scheduler thread will be blocked during that operation. This can be a great performance problem. This is the reason I am bringing this up.

  2. This is the actually tricky part. Afaik ASIO which is used for all other networking, pipe, stdstream IO will not work on regular files. Winfows has some kind of asynchronous file IO which i know nothing about, if anyone could shed some light on this, that would be great. Posix offers the aio_* apis, basically offloading file IO to a separate threadpool in userland. This API, i think, is a good candidate due to cross-platform compatibility. Another one would be libuv which is completely cross platform and offers async name resolution as well. It does file io in a conceptually similar manner than the aio api such that it uses blocking file apis but executed them on a separate threadpool. It seems a bit overkill for the problem at hand and possibly it makes most sense to completely move all io operations to libuv instead of adding it alongside asio.

mfelsche avatar Jul 16 '18 19:07 mfelsche

  1. Would it make sense to roll our own threadpool for blocking io operations and integrate it into the existing asio implementation. That would e.g. mean we register an asio event, read from a file on the threadpool, when the data is there we send it using an asio event from the threadpool to the pony schedulers. The reason i am suggesting it is that we most likely have hard performance constraints that other libs might not satisfy. And it might be the quickest to do, given we get the threadpool right. ( What could go wrong? ;-))

mfelsche avatar Jul 17 '18 14:07 mfelsche

We've talked about libuv in the past - the consensus at the time was that adapting libuv to our purposes would be more hassle than help. Maybe we can discuss it again though, if it would be helpful.

jemc avatar Aug 03 '18 14:08 jemc

I do not think libuv is the right approach for us. I think something along the lines of Erlang's "dirty schedulers" would be the correct approach. I reserve the right to change my opinion later.

SeanTAllen avatar Aug 03 '18 17:08 SeanTAllen

@SeanTAllen or @slfritchie could you elaborate on the concept of dirty schedulers? Would that basically mean, we flag behaviours based on whether they do blocking IO and depending on that we schedule them on a special scheduler pool? Advantage here would be, we could keep the blocking APIs synchronous, thus simple (e.g. like the current files API). Would that actually be the case?

mfelsche avatar Aug 09 '18 17:08 mfelsche

The Erlang BEAM VM scheduler differs from Pony's in a couple of significant ways: BEAM's is preemptive and BEAM's avoids using wall clock time (or any other traditional notion of time) when making preemption decisions.

Preemption can be triggered by: a). reduction count (roughly equivalent to function call count), VM internal trap, or blocked message receive (mailbox is empty or selective receive pattern match fails on all queued messages).

The addition of NIFs (native implemented functions), which are written in C but appear to the Erlang programmer to be Erlang, can cause a big problem with the reduction count method. Steve Vinoski was a primary author of the NIF scheme. In https://github.com/vinoski/bitwise/blob/master/vinoski-schedulers.pdf notes a problem with a NIF that implements an XOR function:

  • "Blocked a scheduler thread for 5.86 seconds
  • And only 4 reductions"

That causes all kinds of havoc with the schedulers. It's more "hilarious"(*) when schedulers start going to sleep due to mis-counting of reductions and then never bother waking up, despite huge demand to schedule runnable processes. Note also that performing I/O isn't necessary: anything that blocks a return of control to the scheduler is fair game, including XOR calculations on GBytes of data or simply calling sleep(3).

Nowadays, a NIF can have metadata associated with it to mark it as "dirty". Execution of dirty NIFs are transferred over to a dedicated set of Pthreads, the dirty thread pool. There's a non-zero overhead for switching threads, naturally, but it's far better than angering the usual schedulers' way of doing things.

With the Pony runtime's cooperative scheduling approach, I'm not aware of too many choices. One would be to always run an actor that might block the Pthread to run via a separate Pthread pool. Another is a message-passing approach: send a message to a dedicated thread or thread pool that executes the desired operation and then sends the result back. The latter is the method that Erlang's original file I/O subsystem operated, but I see no easy way to fit that scheme into Pony's runtime today without lots of other side-effects and consequences.

@mfelsche's idea of using the separate pool only for behaviors that are "known" to do blocking stuff. I hadn't thought that of that, silly me. It's a nifty idea and probably deserves a lot more pondering.

BEAM references for the curious:

  • https://happi.github.io/theBeamBook/#CH-Scheduling ... Erik Stenman's book-but-not-a-book that probably describes the BEAM scheduler in more detail outside of the source code?
  • https://hamidreza-s.github.io/erlang/scheduling/real-time/preemptive/migration/2016/02/09/erlang-scheduler-details.html ... "Erlang Scheduler Details and Why It Matters" blog article
  • http://jlouisramblings.blogspot.com/2013/01/how-erlang-does-scheduling.html "How Erlang does scheduling" and https://medium.com/@jlouis666/erlang-dirty-scheduler-overhead-6e1219dcc7 "Erlang Dirty Scheduler Overhead: Using DTrace to figure out what calls cost" ... blog articles
  • https://github.com/vinoski/bitwise/blob/master/vinoski-schedulers.pdf and https://github.com/vinoski/bitwise ... Steve Vinoski's early work on the dirty scheduler implementation (IIRC back in the R16 or R17 days; Erlang is now at major release 21)

(*) Where "hilarious" means "terrible things happen at weird times or the worst possible high-demand times".

slfritchie avatar Aug 09 '18 20:08 slfritchie

Leaving aside the "how do we know something will block". I think what we would want is...

  • normal scheduler behavior for "non-blocking" calls
  • we have a pool of threads that can "take over" 1 or more cpus to do blocking calls. the scheduling in this "dirty pool" would be along the lines of how Go does scheduling. See https://www.youtube.com/watch?v=NjMGHrM2cc0&list=PL2ntRZ1ySWBdatAqf-2_125H4sGzaWngM&index=8 for a lot of good info on that.
  • when a cpu is "stolen" for the dirty pool, then the scheduler thread tied to that cpu is paused (like how we currently do scaling) until the cpu(s) are available again.

SeanTAllen avatar Sep 12 '18 20:09 SeanTAllen

I'm sure some of you have heard about the new asynchronous I/O interface in Linux 5.1, io_uring, but I thought I'd leave a note about it here nonetheless.

Here's a document that goes into detail about the new interface: http://kernel.dk/io_uring.pdf
And here's a good LWN article about it: https://lwn.net/Articles/776703/

Under section 3.0 - New interface design goals in the document:

  • Extendable. While my background is mostly storage related, I wanted the interface to be usable for more than just block oriented IO. That meant networking and non-block storage interfaces that may be coming down the line. [...].

It sounds like in the future, the interface may support asynchronous network I/O as well.

On Windows, IOCP exists for asynchronous I/O.

Svenskunganka avatar May 11 '19 12:05 Svenskunganka