RethinkDb.Driver icon indicating copy to clipboard operation
RethinkDb.Driver copied to clipboard

Simplify run helpers?

Open oliverjanik opened this issue 9 years ago • 13 comments

One one the first things that confused me was why are there so many Run* methods.

Let's see (excluding async versions):

  1. object Run()
  2. object Run<T>()
  3. void RunNoReply()
  4. Cursor<T> RunCursor<T>()
  5. T RunAtom<T>()
  6. T RunResult<T>()
  7. Result RunResult()
  8. Cursor<Change<T>> RunChanges<T>()
  9. IEnumerable<GroupedResult<TKey, TItem>> RunGrouping<TKey, TItem>()

Why is (2) needed? If i give it a <T> and it returns object it's not at useful as it could be. Here's suggestion:

  1. object Run()
  2. T Run<T>()
  3. Cursor<T> RunCursor<T>()

If you need Result you'd just do Run<Result>(), if you need Atom you'd do Run<int>.

What do you think?

oliverjanik avatar Apr 20 '16 02:04 oliverjanik

Hey Oliver,

To answer your question (from documentation) the differences are subtle but valuable differences. Each vary in strictness in what they can deserialize:

Wiki: Run Helpers

.RunAtom

When the response type for a query is a SUCCESS_ATOM the driver is expected to return an object of T. Typically, queries that return singleRowSelection are atom responses.

Hero result = R.Db("marvel").Table("heros")
              .Get("iron_man")
              .Run<Hero>(conn);

var result = R.Db("marvel").Table("heros")
              .Get("iron_man")
              .RunAtom<Hero>(conn);

Emphasis on parsing SUCCESS_ATOM responses only.


.RunResult<T> (Generic)

.RunResult<T>() helps deserialize T object atom (SUCCESS_ATOM) OR finished sequences List<T> (SUCCESS_SEQUENCE) List<T> server responses.

Either SUCCESS_ATOM or SUCCESS_SEQUENCE.


.RunResult (Non Generic) DML

.RunResult() helps with DML (data manipulation language) type of result queries:

var result = R.Db("marvel").Table("heros")
              .Insert({"name":"Iron Man"})
              .RunResult(conn);
/*
{
    "deleted": 0,
    "errors": 0,
    "inserted": 1,
    "replaced": 0,
    "skipped": 0,
    "unchanged": 0
}
*/

As for .Run<T>, per documentation:

Wiki: Dynamic Language Runtime (DLR) Integration

/* long result is 4 of type long */
long result = R.Table("foobar").Get("a")["Baz"].Run(conn);

// Give the compiler and run-time more type information with run<T>()
/* int result is 4 of type int */
int  result = R.Table("foobar").Get("a")["Baz"].Run<int>(conn);

Notice the result declarations long and int and their respective Run and Run<T> calls. The underlying deserializer, Newtonsoft.Json determined the underlying type (without T) is a long. Given T as int, the deserializer can make a more specific deserialization of the result.


Thanks, Brian

bchavez avatar Apr 20 '16 04:04 bchavez

Ultimately, though, I like the way run helpers cover all response types without involving DLR. I was using the driver in my projects and I ran into situations where I wanted .Run to do more. I prefer having the .RunMethod(conn) call to be as simplified as possible.

Imagine:

var result = 
      R.Expr(games).Group("player")
       .Run<GroupedResult<string, Game>>(conn);
       // Eh. Double Generic. Not so pretty nor elegant ...

versus:

var result = 
    R.Expr(games).Group("player")
     .RunGrouping<string, Game>(conn);
     // legit

If you have better naming suggestions for the run helpers we could discuss those suggestions.

bchavez avatar Apr 20 '16 04:04 bchavez

Thanks Brian, I have read that documentation. I'm playing devil's advocate here.

As the user of the library I don't quite see why there is a difference between Run<T>, RunResult<T>, RunAtom<T> or why I should even care. They all return json that I want serialized back to <T>.

  1. The object Run<T> is the most confusing. If I ask for a an int why does it return dynamic? In what situation would this be useful?
  2. The T RunResult<T> covers other ones, doesn't it? if result is SUCCESS_ATOM then RunResult<T> does exact same thing as RunAtom<T>, correct?
  3. And RunResult<Result>() does the same thing as RunResult(), right?

oliverjanik avatar Apr 20 '16 04:04 oliverjanik

Hi @oliverjanik ,

No worries. :+1:

Regarding 1

We briefly had a technical discussion about this in the early days of the driver. I believe the discussion started here: https://github.com/bchavez/RethinkDb.Driver/issues/4#issuecomment-149922415

IIRC, T .Run<T> becomes an issue with Cursors. While it's easy for us to return T as int or long or Person. T is freely provided by the compiler and can be used in the C# language by generic parameter. What is not so easy (nor free) is determining T in .Run<Cursor<T>>, where T is the cursor's item type. We need T in .Run<Cursor<T>> when deserializing Cursors. To do that, we would need to use System.Reflection to probe for T in Run<Cursor<T>>, and as we know, System.Reflection is slow and gets ugly fast. Newtonsoft has no idea how to construct Cursor<T> from JSON.

So, dynamic .Run<T> will get us: T as int or long or Person or Cursor<T> :sparkles: free without much hassle. We have T immediately available as a generic argument. Additionally, dynamic .Run<T> behavior is consistent with the Java driver and all the other official drivers where return type could be T or Cursor<T>.

Regarding 2

The T RunResult<T> covers other ones, doesn't it? if result is SUCCESS_ATOM then RunResult<T> does exact same think as RunAtom<T>, correct?

RunResult<T> handles both SUCCESS_ATOM and SUCCESS_SEQUENCE responses. But .RunAtom only handles SUCCESS_ATOM. So, .RunAtom could be useful to someone out there that only wants to ensure query SUCCESS_ATOM responses. There may be a use case and would like to leave this door open for them.

Regarding 3

And RunResult<Result>() does the same thing as RunResult(), right?

Yes, I think so. In terms of aesthetics and style I think RunResult() is a lot better than RunResult<Result>(). :sunglasses:


As the user of the library I don't quite see why there is a difference between Run<T>, RunResult<T>, RunAtom<T> or why I should even care. They all return json that I want serialized back to <T>.

You make a good point. I would agree that the protocol details have bled into run helpers (such as needing to know what your response type is: SUCCESS_ATOM, SUCCESS_SEQUENCE, SUCCESS_PARTIAL). We're giving a full range of performance options albeit at a slightly cost of complexity of needing to know which run helper to use correctly. We could be open to suggestions on better naming of the run helpers but removing some I don't think is a good idea.

Thoughts?

bchavez avatar Apr 20 '16 05:04 bchavez

@bchavez

I'm running into a problem with Run helpers. I have a query like this:

r.db.table().getAll()

Which returns Cursor, so I use RunCursor<>. Now when I modify the query like this:

r.db.table().getAll().orderBy()

It doesn't return cursor, it returns Atom.

Which helper do I use to cover both cases? The orderby expression is appended on the fly.

Edit: I just found this: http://stackoverflow.com/questions/25375082/when-will-rethinkdb-return-a-cursor It seems it's entirely up to server to return whatever response type it likes and drivers need to cope.

Should RunCursor support all response types including non streaming ones and just provide in memory shim for iteration?

oliverjanik avatar May 18 '16 02:05 oliverjanik

Hey @oliverjanik, depends on the response type. I'd need to see the protocol traces for each. There are two types of cursor responses: cursor partial and cursor complete. If it's a complete cursor response then .RunResult<T> run helper can parse both atom and cursor complete. However, _I think_ it would be unsafe to use a run helper in this respect because a table().getall() could return a partial cursor eventually causing .RunResult<T> to throw.

Also, if you need any more help, I'm available on slack. http://slack.rethinkdb.com @bchavez

bchavez avatar May 18 '16 02:05 bchavez

One thing about run helpers that bothered me was the sort of ambiguous method overload RunResult method:

  1. T RunResult<T>()
  2. Result RunResult()

So, I've been thinking about renaming the 7. Result RunResult() to:

  1. Result RunWrite().

The Go driver uses RunWrite and I think the name is better suited for what it actually does, write to the database and get a write Result response. :)

This will probably make it into 2.4 driver release with an [Obsolete] on Result RunResult().

:sunny: :crescent_moon: "I've been counting down the days and the nights..."

bchavez avatar Apr 25 '18 17:04 bchavez

Hi @bchavez This has bit me again.

I have situation where a query gets built dynamically based on various parameters.

This depending actual query that gets issued I can get one of 3 responses:

  1. SUCCESS_SEQUENCE
  2. SUCCESS_PARTIAL
  3. SUCCESS_ATOM with an array

From my point of view I just need to return an array out of my API.

Is there a single Run Helper method that covers the 3 scenarios?

RunCursor does not handle SUCCESS_ATOM, RunResult<List<T>> does not handle SUCCESS_PARTIAL.

would it be possible for RunCursor to support SUCCESS_ATOM if the result is an array?

oliverjanik avatar Aug 22 '19 04:08 oliverjanik

Hi @oliverjanik,

Pretty interesting problem and scenario.

My initial thoughts are not to make contract changes to .RunCursor() as it relates to the protocol responses being read from the wire.

However, I think there is an escape hatch you can utilize. Specifically, the undocumented .RunUnsafeAsync() method. See https://github.com/bchavez/RethinkDb.Driver/pull/141 for more info.

Basically, send your dynamic query down .RunUnsafeAsync(conn) and handle the raw Response object you get back manually.

Normally, the Run Helpers, like .RunResult() take a Response object and check the kind of response via Response.IsAtom or Response.IsSequence and do something specific with the response type. See below:

https://github.com/bchavez/RethinkDb.Driver/blob/044749812e51b918c09724cbe54a7d99e281cfc1/Source/RethinkDb.Driver/Net/Connection.cs#L327-L344

https://github.com/bchavez/RethinkDb.Driver/blob/044749812e51b918c09724cbe54a7d99e281cfc1/Source/RethinkDb.Driver/Net/Response.cs#L114-L118

The Response.IsAtom, Response.IsSequence, and Response.IsPartial help determine the kind of response to process SUCCESS_ATOM, SUCCESS_SEQUENCE, or SUCCESS_PARTIAL respectively. Then the Run Helpers sometimes apply transformations over the JSON data (ie: Converter.ConvertPseudoTypes). Finally, the Run Helpers returns the appropriate type for consumption.

A possible solution to your problem would probably involve having a run helper extension method that uses .RunUnsafeAsync under the hood that processes the Response object manually and can handle all 3 response types for your specific use case. Be mindful of disposing of any new Cursor objects you create manually or you could run the risk creating a memory leak.

Let me know if you think this approach could help.

Thanks, Brian

:walking_man: :walking_woman: Missing Persons - Walking in L.A.

bchavez avatar Aug 22 '19 06:08 bchavez

Also, another approach could be to use some ReQL primitives that force the response object to always be SUCCESS_SEQUENCE or SUCCESS_PARTIAL, but never SUCCESS_ATOM. For example, .CoerceTo could help or something like r.expr([]).prepend(r.expr({foo:'bar'})) which forces the {foo:'bar'} object literal of SUCCESS_ATOM into an array of a SUCCESS_SEQUENCE response type.

bchavez avatar Aug 22 '19 06:08 bchavez

My solution at the moment is to coerce everything to "array". I'm wondering if I lose out on benefits of actual cursors then. Or maybe it might strain the DB too much when requesting large dataset.

I'm a simple man when I issue Select clause against SQL Server I always get a dataset I can iterate. I don't care if it's cursor or list or atom or whatever.

With RethinkDB it seems to decide on the fly what kind of response it gives. Makes things rather non-trivial when using the driver. I'm not sure how predictable the response type is, given same input.

I'll look into the RunUnsafeAsync, that's my last resort.

oliverjanik avatar Aug 23 '19 03:08 oliverjanik

Thanks Oliver. I appreciate your perspective. Let me know if you get an implementation using .RunUnsafeAsync() working.

Perhaps we can look at adding a new run helper like .RunEnumerableAsync(), or .RunReadAsync() that always returns something that can be enumerated over regardless of the response type.

I've been planning to implement IAsyncEnumerable when C# 8 and .NET Standard 2.1 hit, so that's probably around the timeframe we can start looking at what we can do to improve the situation w.r.t dynamic queries.

Goes without saying, much of what we do in programming has some kind of trade-off. Also, let's not forget the spirit of the run helpers is to avoid the performance hit from the DLR. Many of these helper methods are tools to help you get closer to the wire. As such, protocol details like this, unfortunately, bubble up through the API.

bchavez avatar Aug 23 '19 04:08 bchavez

Perhaps we can look at adding a new run helper like .RunEnumerableAsync(), or .RunReadAsync() that always returns something that can be enumerated over regardless of the response type.

I am voting for this feature/methods. It would simplify driver use cases for "end-programmer", because IEnumerable output is expected outcome from the "SELECT-like" queries.

walterX12 avatar Feb 25 '20 10:02 walterX12