DataAccessPerformance API consideration ExecuteAsync

Some initial thoughts; not sure if I should log these individually or whether this is fine, but:

not sure how well it works for non-query and multi-grid query scenarios
the factory approach as illustrated by Execute_success() with the list add in the factory precludes non-buffered data streams; really feels like the primary API should expose something more akin to async-enumerable (or a spoofed similar if that still isn't nailed down)
the column bind mechanism looks ... unexpected; personally I would have expected that to be part of the factory method, i.e. Func<SomeRowApi, TResult>; but in particular the current API is incompatible with both immutable types (which are a thing) and value types (because of pass-by-value) (edit: I guess technically you could work with immutable types via return obj.WithName(row.ReadString()) etc in each branch of a switch, but it would be horribly inefficient for classes)
not sure quite how well the arbitrary name prepare/execute is going to map to other providers
whole topic of parameterization
whole topic of column metadata inspection - meaning: as a library author, I want to construct the factory/binder at runtime based on looking at the columns
might need a whole lot more async when dealing with large data sets or wide columns (multiple large strings etc) - perhaps via ValueTask<T>-based Read*Async APIs?

I appreciate the current status is minimal viable working code and that most of these things are almost certainly on your radar, but I wanted to throw them down.

In particular, re the primary multi-row API, my "gut" suggests something more like:

public WhateverAsyncEnumerableLooksLike<TResult> Execute(...,
    Func<RowAPI, Task<TResult>> factory) => Execute(..., () => factory);    
public WhateverAsyncEnumerableLooksLike<TResult> Execute(...,
    Func<MetadataAPI, Func<RowAPI,Task<TResult>>> factoryFactory) // oh shit, I think I just java'd myself

so that the following is possible (for the "we know what the data looks like, thanks" case):

var data = session.Execute<Customer>(..., row => ReadCustomerAsync(row));
...
static async Task<Customer> ReadCustomerAsync(RowAPI row) {
    var customer = new Customer();
    customer.Id = row.ReadInt32(); // in my head I'm assuming this increments the column automagically
    customer.Notes = await row.ReadStringAsync(); // value task; might be in the buffer, might not?
    customer.Foo = row.ReadDouble();
    ...
    return customer;
}

where-as a library based reader (that doesn't know about types ahead of time) might be more like:

var data = session.Execute<T>(..., metadata => GetTypeReader<T>(metadata));

where GetTypeReader looks at the T and the metadata, and hands back a (possibly cached) configuration/strategy-generated Func<Task<RowAPI, T>>

If Customer was immutable, then the same would be:

static async Task<Customer> ReadCustomerAsync(RowAPI row) {
    var id= row.ReadInt32(); // in my head I'm assuming this increments the column automagically
    var notes = await row.ReadStringAsync(); // value task; might be in the buffer, might not?
    var foo= row.ReadDouble();
    ...
    return new Customer(id, notes, foo, ...)
}

The key point I'm trying to explore here is essentially: when and how can callers inspect column metadata, and when and how can they use that to influence what Peregrine is doing? And also: when and how do immutable/value-types work.

Nov 15 '17 11:11 mgravell

after some thought, I wonder whether returning something akin to IDataReader but backed by the new impl is frankly simpler and more flexible...

Nov 15 '17 12:11 mgravell

@mgravell Excellent points. Thanks for taking the time to put this together.

This code shouldn't be considered any kind of API proposal and I agree completely that it is terrible! 😄 Right now, this is just an experiment (unfinished) we put together to help us establish a perf. baseline for one data access scenario.

The initial results are promising in that we see 25%-33% throughput improvement over ADO.NET, and we get pretty close to the numbers produced by the pgbench tool. So we know it is possible!

We plan to evolve this some more (@davidfowl wants to try his Pipelines magic), but we would also love to get PRs on this. I think it would be great to have a bunch of different API proposals in here that we could compare.

Nov 15 '17 18:11 anpete

@mgravell Another thing to bear in mind: The goal here is be fast (and low allocating), and we expect to have to trade some amount of usability to get there. The idea is that frameworks like Dapper and EF can use these very low-level APIs, but they may not be great for general consumption.

Nov 15 '17 23:11 anpete

Absolutely - and I'm 100% thinking about the API surface from the perspective of what I'd need to do to build a library layer like dapper (or EF) on top of it. It was exactly those considerations that motivated most of the points.

On 15 Nov 2017 11:39 p.m., "Andrew Peters" [email protected] wrote:

@mgravell https://github.com/mgravell Another thing to bear in mind: The goal here is be fast, and we expect to have to trade a some amount of usability to get there. The idea is that frameworks like Dapper and EF can use these very low level APIs, but they may not be great for general consumption.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aspnet/DataAccessPerformance/issues/14#issuecomment-344766048, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDsPnhwrTnowfUF4Egd37IKXt2KSDRks5s23Y6gaJpZM4Qex61 .

Nov 15 '17 23:11 mgravell

DataAccessPerformance DataAccessPerformance copied to clipboard

API consideration ExecuteAsync

DataAccessPerformance
DataAccessPerformance copied to clipboard