csharp-driver
csharp-driver copied to clipboard
CSHARP-994 Add AsyncEnumerable support to RowSet
Any async method in Mapper can block the thread if there are more than one page in RowSet. This can easily starve the thread pool and obliterate the performance of your app.
The proper fix it to make RowSet implement IAsyncEnumerable. The latter is only available in .NET Core 3 though. I could make the driver target this version but it reached end of life 7 years ago. According to versionsof.net/core, the oldest active .NET version is 8 so that's what I picked.
The PR does not compile right now because the .NET 8 target generates a bunch of obsolete warnings (upgraded to errors) about System.Runtime.ConstrainedExecution and TLS1.X. Also, tests are missing.
I would like to get some first feedbacks before resuming the work, especially around adding this new target framework. I believe it's time to add new targets as .NET Standard is now 8 years old.
I agree that making RowSet implement IAsyncEnumerable would be ideal but that will require some discussion about .NET targets so if we're more concerned about Mapper methods internally enumerating through RowSet in a "sync" manner then we could change that without having to rely on IAsyncEnumerable, we would just have to make the mapper iterate "manually" through the rowset in an async way. This would be the cleanest way to fix this without getting into an extended discussion about moving to newer .NET targets.
Irt .NET targets it would make more sense to move from .NET Standard 2.0 to .NET Standard 2.1 as that would allow us to keep supporting .NET platforms that support .NET standard while at the same time being able to leverage features like IAsyncEnumerable
Also the mapper not relying on IAsyncEnumerable will also make it so that issue is fixed in .NET Framework as well (async streams are not supported on .NET Framework)
we would just have to make the mapper iterate "manually" through the rowset in an async way.
https://github.com/datastax/csharp-driver/blob/aa4f36e5c2e01042de82bab42a004bbd9a1c594b/src/Cassandra/Mapping/ICqlQueryAsyncClient.cs#L30
This is tricky because FetchAsync returns an IEnumerable. So, it tranfers the responsibility to enumerate the rows to the caller. And since they only have a synchronous API, all they can do is to block the thread.
In my company, I have applied some work arounds where the performance impact was alarming. The first one was to greatly increase the page size so all the rows would fit on a single page. That greatly impacted the GC of the client but it also impacted the server and I had to find another work around when the team operating cassandra came to us. The second one was this
var rows = await _session.ExecuteAsync(stmt, profile);
while (!rows.IsFullyFetched)
{
await rows.FetchMoreResultsAsync();
}
return rows.Select(r =>
{
var x = r.GetValue<int>(0);
var y = r.GetValue<string>(1);
var z = r.GetValue<byte[]>(2);
return new MyObject(x, y, z);
}).ToArray();
This seems to be fix the thread starvation too and is more friendly to the GC. Though, all the rows must be in memory at some point.
But anyway it's just a work-around that most developers won't know about.
Also the mapper not relying on IAsyncEnumerable will also make it so that issue is fixed in .NET Framework as well
One good thing about that IEnumerable API is that all the rows don't have to be loaded in memory at the same time. The developer can chose to load them at their own pace. I'm not sure we can achieve the same in async without IAsyncEnumerable.
implement IAsyncEnumerable would be ideal but that will require some discussion about .NET targets Irt .NET targets it would make more sense to move from .NET Standard 2.0 to .NET Standard 2.1
I believe a serious discussion should be had around which .NET version to target. I'm sure you have many clients still using .NET Framework but double targeting the LTS could allow developers using more recent features of .NET while still support old ones.
In my company, I have applied some work arounds where the performance impact was alarming. The first one was to greatly increase the page size so all the rows would fit on a single page. That greatly impacted the GC of the client but it also impacted the server and I had to find another work around when the team operating cassandra came to us. The second one was this
Have you tried manual paging as described in the manual? That should fix the issue but it's a bit more code than relying on automatic paging. This is generally the best approach if you're concerned about thread starvation and GC pressure.
I think we can move from netstandard2.0 to netstandard2.1 and use conditional compilation to make RowSet implement IAsyncEnumerable in netstandard2.1, if you're interested in working on this I'd say go for it
Have you tried manual paging as described in the manual?
Thx, I'll give it a try!
I think we can move from netstandard2.0 to netstandard2.1
Done
Reading through this page I see there is a community maintained implementation of these methods, shouldn't we use this package instead?
They even have the workaround listed there for net10 users
Reading through this page I see there is a community maintained implementation of these methods, shouldn't we use this package instead?
I think it's important to keep the dependencies to a minimum. Here we are talking about ~100 lines of very easy code so I would greatly prefer that over an extra dependency.
@joao-r-reis is that PR still on your radar?