DataAccessPerformance
DataAccessPerformance copied to clipboard
API consideration ExecuteAsync
Some initial thoughts; not sure if I should log these individually or whether this is fine, but:
- not sure how well it works for non-query and multi-grid query scenarios
- the factory approach as illustrated by
Execute_success()
with the list add in the factory precludes non-buffered data streams; really feels like the primary API should expose something more akin to async-enumerable (or a spoofed similar if that still isn't nailed down) - the column bind mechanism looks ... unexpected; personally I would have expected that to be part of the factory method, i.e.
Func<SomeRowApi, TResult>
; but in particular the current API is incompatible with both immutable types (which are a thing) and value types (because of pass-by-value) (edit: I guess technically you could work with immutable types viareturn obj.WithName(row.ReadString())
etc in each branch of aswitch
, but it would be horribly inefficient for classes) - not sure quite how well the arbitrary name prepare/execute is going to map to other providers
- whole topic of parameterization
- whole topic of column metadata inspection - meaning: as a library author, I want to construct the factory/binder at runtime based on looking at the columns
- might need a whole lot more async when dealing with large data sets or wide columns (multiple large strings etc) - perhaps via
ValueTask<T>
-basedRead*Async
APIs?
I appreciate the current status is minimal viable working code and that most of these things are almost certainly on your radar, but I wanted to throw them down.
In particular, re the primary multi-row API, my "gut" suggests something more like:
public WhateverAsyncEnumerableLooksLike<TResult> Execute(...,
Func<RowAPI, Task<TResult>> factory) => Execute(..., () => factory);
public WhateverAsyncEnumerableLooksLike<TResult> Execute(...,
Func<MetadataAPI, Func<RowAPI,Task<TResult>>> factoryFactory) // oh shit, I think I just java'd myself
so that the following is possible (for the "we know what the data looks like, thanks" case):
var data = session.Execute<Customer>(..., row => ReadCustomerAsync(row));
...
static async Task<Customer> ReadCustomerAsync(RowAPI row) {
var customer = new Customer();
customer.Id = row.ReadInt32(); // in my head I'm assuming this increments the column automagically
customer.Notes = await row.ReadStringAsync(); // value task; might be in the buffer, might not?
customer.Foo = row.ReadDouble();
...
return customer;
}
where-as a library based reader (that doesn't know about types ahead of time) might be more like:
var data = session.Execute<T>(..., metadata => GetTypeReader<T>(metadata));
where GetTypeReader
looks at the T
and the metadata
, and hands back a (possibly cached) configuration/strategy-generated Func<Task<RowAPI, T>>
If Customer
was immutable, then the same would be:
static async Task<Customer> ReadCustomerAsync(RowAPI row) {
var id= row.ReadInt32(); // in my head I'm assuming this increments the column automagically
var notes = await row.ReadStringAsync(); // value task; might be in the buffer, might not?
var foo= row.ReadDouble();
...
return new Customer(id, notes, foo, ...)
}
The key point I'm trying to explore here is essentially: when and how can callers inspect column metadata, and when and how can they use that to influence what Peregrine is doing? And also: when and how do immutable/value-types work.
after some thought, I wonder whether returning something akin to IDataReader
but backed by the new impl is frankly simpler and more flexible...
@mgravell Excellent points. Thanks for taking the time to put this together.
This code shouldn't be considered any kind of API proposal and I agree completely that it is terrible! 😄 Right now, this is just an experiment (unfinished) we put together to help us establish a perf. baseline for one data access scenario.
The initial results are promising in that we see 25%-33% throughput improvement over ADO.NET, and we get pretty close to the numbers produced by the pgbench tool. So we know it is possible!
We plan to evolve this some more (@davidfowl wants to try his Pipelines magic), but we would also love to get PRs on this. I think it would be great to have a bunch of different API proposals in here that we could compare.
@mgravell Another thing to bear in mind: The goal here is be fast (and low allocating), and we expect to have to trade some amount of usability to get there. The idea is that frameworks like Dapper and EF can use these very low-level APIs, but they may not be great for general consumption.
Absolutely - and I'm 100% thinking about the API surface from the perspective of what I'd need to do to build a library layer like dapper (or EF) on top of it. It was exactly those considerations that motivated most of the points.
On 15 Nov 2017 11:39 p.m., "Andrew Peters" [email protected] wrote:
@mgravell https://github.com/mgravell Another thing to bear in mind: The goal here is be fast, and we expect to have to trade a some amount of usability to get there. The idea is that frameworks like Dapper and EF can use these very low level APIs, but they may not be great for general consumption.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aspnet/DataAccessPerformance/issues/14#issuecomment-344766048, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDsPnhwrTnowfUF4Egd37IKXt2KSDRks5s23Y6gaJpZM4Qex61 .