YamlDotNet icon indicating copy to clipboard operation
YamlDotNet copied to clipboard

Add async support for all IO operations

Open aaubry opened this issue 5 years ago • 18 comments

Add async overloads of all methods that perform IO. I am opening this ticket because I feel that this should eventually be implemented. If this is a feature that would be valuable to you, please say so in the comments.

aaubry avatar Aug 28 '19 10:08 aaubry

I wonder what the performance benefit would actually be here, since in my experience this is already pretty quick!

Would you create a second set of functionality with the Async suffix? Or would this be an upgrade and replace of all non async functionality to be async?

samsmithnz avatar Dec 07 '19 13:12 samsmithnz

The intention is more to be able to support workflows that only allow async operations. For example, the aspnet request or response streams, which by default require async. I don't have a clear design in mind yet, but I think that the synchronous API would still be necessary, therefore there would need to be a second set of APIs with the async suffix. Forcing async usage in all cases would have a performance cost for all memory-only use cases, which I think are relatively common. What are your thoughts on this ?

aaubry avatar Dec 07 '19 19:12 aaubry

I like your plan, it makes sense. If you want to identify particular functions that you'd like to convert, I'm happy to contribute and work on this. I have some experience writing async functions...

samsmithnz avatar Dec 07 '19 20:12 samsmithnz

Sure. At the moment I can't dedicate too much time to this feature, so any help is welcome!

aaubry avatar Dec 08 '19 09:12 aaubry

I've created an initial commit and would value your input. Would you mind having a peak to make sure it meets your standards before I continue?

Note that this is only for "serialize" so far, and I've only written one async test (I tried to write an alternative for a (relatively) long running unit test, which you can see, appears to have a significant performance boost):

image

Pull Request: #457 (build currently failing due to some performance test issue I will also investigate)

samsmithnz avatar Dec 08 '19 21:12 samsmithnz

I took a look at picking up #457 -- I agree with the comment there that async should start at the Emitter and propagate up the stack. I think due to the number of callsites to Write() (which would need to be async and then the callsites would themselves need to be within async methods) that doing this while retaining the synchronous API would result in a lot of code duplication. I think there are 3 possible paths:

  1. Drop the synchronous API
  2. Make a thin-synchronous wrapper around the otherwise async implementation (generally frowned on)
  3. Invent some new architecture that somehow reduces how far async would propagate (I don't have any ideas here)

Personally, those paths are in order of preference and it's likely I could devote some significant development time towards 1 or 2.

willson556 avatar Apr 21 '20 23:04 willson556

Regarding the options that you suggest, I think that only the first one is viable. Option 2 would add very little value, I think, and if there was an easy way to implement option 3, it would probably already be built into the C# language.

Not keeping the synchronous API is a problem because we target platforms that do not support async. I'm reluctant to discontinue their support and would rather not have async APIs if this proved too difficult to maintain.

I have another concern which is that the emitter currently performs a lot of small writes to the underlying TextWriter. Most of them will probably end up just writing to a buffer and return synchronously. This means that we'll end up having a lot of overhead and maybe the end result will be worse than using the synchronous APIs... We would need to perform some tests before fully committing to using async. I believe that using System.IO.Pipelines could help here but would require a complete reimplementation of the parser and emitter.

Do you have ideas to address these issues ?

aaubry avatar Apr 22 '20 20:04 aaubry

I agree with everything stated above. I started to look at how other serializers had handled this. It looks like Newtonsoft.Json hasn't fully implemented async support -- they seem to have support in the parser but not in the serializer. I then took a look at System.Text.Json and they took an interesting approach to leave async out of most of the codebase: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.Stream.cs#L76

They seem to be calling the normal synchronous code and then just copying the output to the stream asynchronously. This sort of approach would seem to be a lot easier to implement in YamlDotNet but I'm not sure how much it really improves performance? I may try and throw together a test of that approach in the next couple of days to see.

willson556 avatar Apr 24 '20 01:04 willson556

Regardless of performance with more conventional streams, the JsonSerializer approach would meet the stated goal of supporting async-only streams.

willson556 avatar Apr 24 '20 01:04 willson556

Just came across https://devblogs.microsoft.com/dotnet/system-io-pipelines-high-performance-io-in-net/ which seems highly applicable here (and I now realize you pointed out in your last comment).

willson556 avatar Apr 27 '20 14:04 willson556

Yes, this would probably be the way forward, although we would have issues with some of the targetted platforms. I don't think that it is very important to keep supporting .NET 2.0, but Unity uses .NET 3.5 and I believe that there are many users of the library on that platform.

aaubry avatar May 01 '20 12:05 aaubry

I think with that approach it might be possible to avoid excessive code duplication and use conditional compiles and package references to maintain compatibility. I haven't yet had time to prototype anything concrete to see if it pans out.

willson556 avatar May 01 '20 19:05 willson556

Antoine, do you need help on this?

lesair avatar May 21 '20 18:05 lesair

I'm not currently working on this, but maybe @willson556 does. I'm currently working on a different feature, schemas, and can't spend time on this feature at the moment.

aaubry avatar May 22 '20 08:05 aaubry

I haven't started work on this either. I was sort of waiting to see the outcome of #482 and #486 before proceeding with further contributions though I do have a long-run interest in building this.

It is my current belief based on some work we've done since April that it would not be too difficult to do this using the producer/consumer pattern utilizing System.IO.Pipelines or System.Threading.Channels to bridge the gap between calls to Write() in the emitter and async IO. I haven't looked at the read side but I imagine a similar pattern would be possible there as well.

willson556 avatar Jun 09 '20 04:06 willson556

Sounds like fun. How can I help?

lesair avatar Jun 09 '20 14:06 lesair

Any news on this topic since 2020? Some .NET parts do block synchronous IO operations, so having an async API here would definitely make things smoother.

goncalo-oliveira avatar Mar 30 '23 15:03 goncalo-oliveira

+1 to a high-speed implementation based on System.IO.Pipelines.

goldsam avatar Oct 26 '23 14:10 goldsam