csharp-driver icon indicating copy to clipboard operation
csharp-driver copied to clipboard

CSHARP-994 Add AsyncEnumerable support to RowSet

Open verdie-g opened this issue 7 months ago • 10 comments

Any async method in Mapper can block the thread if there are more than one page in RowSet. This can easily starve the thread pool and obliterate the performance of your app.

The proper fix it to make RowSet implement IAsyncEnumerable. The latter is only available in .NET Core 3 though. I could make the driver target this version but it reached end of life 7 years ago. According to versionsof.net/core, the oldest active .NET version is 8 so that's what I picked.

The PR does not compile right now because the .NET 8 target generates a bunch of obsolete warnings (upgraded to errors) about System.Runtime.ConstrainedExecution and TLS1.X. Also, tests are missing.

I would like to get some first feedbacks before resuming the work, especially around adding this new target framework. I believe it's time to add new targets as .NET Standard is now 8 years old.

verdie-g avatar Apr 12 '25 20:04 verdie-g

I agree that making RowSet implement IAsyncEnumerable would be ideal but that will require some discussion about .NET targets so if we're more concerned about Mapper methods internally enumerating through RowSet in a "sync" manner then we could change that without having to rely on IAsyncEnumerable, we would just have to make the mapper iterate "manually" through the rowset in an async way. This would be the cleanest way to fix this without getting into an extended discussion about moving to newer .NET targets.

joao-r-reis avatar Apr 13 '25 20:04 joao-r-reis

Irt .NET targets it would make more sense to move from .NET Standard 2.0 to .NET Standard 2.1 as that would allow us to keep supporting .NET platforms that support .NET standard while at the same time being able to leverage features like IAsyncEnumerable

joao-r-reis avatar Apr 13 '25 20:04 joao-r-reis

Also the mapper not relying on IAsyncEnumerable will also make it so that issue is fixed in .NET Framework as well (async streams are not supported on .NET Framework)

joao-r-reis avatar Apr 13 '25 20:04 joao-r-reis

we would just have to make the mapper iterate "manually" through the rowset in an async way.

https://github.com/datastax/csharp-driver/blob/aa4f36e5c2e01042de82bab42a004bbd9a1c594b/src/Cassandra/Mapping/ICqlQueryAsyncClient.cs#L30

This is tricky because FetchAsync returns an IEnumerable. So, it tranfers the responsibility to enumerate the rows to the caller. And since they only have a synchronous API, all they can do is to block the thread.

In my company, I have applied some work arounds where the performance impact was alarming. The first one was to greatly increase the page size so all the rows would fit on a single page. That greatly impacted the GC of the client but it also impacted the server and I had to find another work around when the team operating cassandra came to us. The second one was this

var rows  = await _session.ExecuteAsync(stmt, profile);

while (!rows.IsFullyFetched)
{
    await rows.FetchMoreResultsAsync();
}

return rows.Select(r =>
{
    var x = r.GetValue<int>(0);
    var y = r.GetValue<string>(1);
    var z = r.GetValue<byte[]>(2);

    return new MyObject(x, y, z);
}).ToArray();

This seems to be fix the thread starvation too and is more friendly to the GC. Though, all the rows must be in memory at some point.

But anyway it's just a work-around that most developers won't know about.

Also the mapper not relying on IAsyncEnumerable will also make it so that issue is fixed in .NET Framework as well

One good thing about that IEnumerable API is that all the rows don't have to be loaded in memory at the same time. The developer can chose to load them at their own pace. I'm not sure we can achieve the same in async without IAsyncEnumerable.

implement IAsyncEnumerable would be ideal but that will require some discussion about .NET targets Irt .NET targets it would make more sense to move from .NET Standard 2.0 to .NET Standard 2.1

I believe a serious discussion should be had around which .NET version to target. I'm sure you have many clients still using .NET Framework but double targeting the LTS could allow developers using more recent features of .NET while still support old ones.

verdie-g avatar Apr 13 '25 21:04 verdie-g

In my company, I have applied some work arounds where the performance impact was alarming. The first one was to greatly increase the page size so all the rows would fit on a single page. That greatly impacted the GC of the client but it also impacted the server and I had to find another work around when the team operating cassandra came to us. The second one was this

Have you tried manual paging as described in the manual? That should fix the issue but it's a bit more code than relying on automatic paging. This is generally the best approach if you're concerned about thread starvation and GC pressure.


I think we can move from netstandard2.0 to netstandard2.1 and use conditional compilation to make RowSet implement IAsyncEnumerable in netstandard2.1, if you're interested in working on this I'd say go for it

joao-r-reis avatar Apr 15 '25 14:04 joao-r-reis

Have you tried manual paging as described in the manual?

Thx, I'll give it a try!

I think we can move from netstandard2.0 to netstandard2.1

Done

verdie-g avatar Apr 15 '25 23:04 verdie-g

Reading through this page I see there is a community maintained implementation of these methods, shouldn't we use this package instead?

joao-r-reis avatar Apr 17 '25 14:04 joao-r-reis

They even have the workaround listed there for net10 users

joao-r-reis avatar Apr 17 '25 14:04 joao-r-reis

Reading through this page I see there is a community maintained implementation of these methods, shouldn't we use this package instead?

I think it's important to keep the dependencies to a minimum. Here we are talking about ~100 lines of very easy code so I would greatly prefer that over an extra dependency.

verdie-g avatar Apr 17 '25 17:04 verdie-g

@joao-r-reis is that PR still on your radar?

verdie-g avatar Apr 28 '25 13:04 verdie-g