drift icon indicating copy to clipboard operation
drift copied to clipboard

Read references from manager queries

Open dickermoshe opened this issue 1 year ago • 10 comments

🚧 WIP 🚧


The current syntax is as follows:

final listings = await db.managers.listing.withReferences().withProduct().get();
for (var l in listings){
  print(l.listingData);
  print(l.references.product); // Statically type to  ProductData? 
  print(l.references.store) // Statically type to void - this is a compile time error https://dart.dev/diagnostics/use_of_void_result
}

Everything is added with joins, so it's all well optimized

Code to try this out is in the manager testing code

dickermoshe avatar May 05 '24 20:05 dickermoshe

The code generator for this will be an abysmal mess until the api design for this is ironed out

dickermoshe avatar May 05 '24 21:05 dickermoshe

After some investigation, it turns you there is no way to read multiple column of a reverse reference in a single query I'm going to go with this implementation where a second query is done in transaction

dickermoshe avatar May 05 '24 21:05 dickermoshe

@simolus3 I've been able to implement a way to get items with their references.

final products = await db.managers.product
    .withReferences()
    .withDepartment()
    .withListings()
    .get();

for (var productWithRefs in products) {
  final product = productWithRefs.productData;
  final department = productWithRefs.references.department;
  final listings = productWithRefs.references.listings;
}

Getting nested references may be possible, but I'm stuck on the current issue: Watching

The only way to get reverse references (like listings above) is to chain the Stream/Future with a 2nd database query. It looks something like this:

$state
.buildJoinedStatement()
.then((products) {
// Get all the results from the query
final productIds = products.map((p)=>p.id);
// Get a list of ALL referenced listings - Avoids N+1 issue
final allReferencedListings = listings.filter((l)=> l.product.id.isIn(productIds )).get();
// Return a tuple List<(product, listings)> 
return products.map((product)=> (product, allReferencedListings.where((listing) => listing.id == product.listingId)))
})

The above works for get, but watching has a major issue: Changes to listings don't trigger the original products stream!

I've started experimenting with combining streams that works on bulk (as opposed to map which only operates on one item at a time), but this requires a new _Selectable class and an expertise in async programming, which I don't have.

I've modified the products section of drift/test/generated/todos.g.dart and drift/test/manager/referenced_reader_test.dart (Lines ~4550) for you to take a look at, thanks!

dickermoshe avatar May 06 '24 00:05 dickermoshe

The API still looks good to me, but I think there should be a way to be more explicit about the impact of joining a reverse reference. Things that suddenly have the potential to make queries much slower should be more obvious than calling withBackreference() somewhere and forgetting about it.

Also, one thing we should consider is the size of generated code (drift is already generating a fair bit of code, and increasing this further has costs on things like analyzer performance). Users are also typically not too happy about generated code they don't need. Code generators are also harder to understand than library code, so the maintenance cost for us is higher. So we should check whether there are ways to simplify this, such as:

  1. By not generating the join-creating logic for each container reader. Instead, I think the main table manager could hold a structure describing all columns involved in references. The runtime would interpret that structure to add the necessary joins. I think the selectable getter could refer to a helper method receiving a list of references to resolve instead of duplicating most of that logic.
  2. This is heavier and impacts usability quite a bit, but perhaps it's still worth it: Most references are probably never joined, so maybe we could approach this from the other and and let users specify which joins they need and then generate a method for those only?

For 2., a possible API I didn't think through at all could perhaps roughly look like this:

class TodoItems extends Table {
  // ... column definitions

  @JoinQuery()
 SomeQueryType<(TodoItem, Category)> withCategory;
}

And then we'd only generate a manager for the @JoinQuery annotations added on a table? I think that could avoid some of the complexity we're paying for a general solution.

These are just early ideas though, there are probably other approaches to reduce some of the complexity in the generator.

simolus3 avatar May 06 '24 21:05 simolus3

I've spent a considerable amount of time on this, but I'll have to agree, in practice this is handing a loaded gun to a child. Reverse references should be returned as a new Selectable that can be watched or awaited.

final category =await  category.filter((f)=> f.id(5)).withReferences().getSingle()
todos = await category.references.todos.get()

The question is how regular joins should be handled.

Option 1

final category =await  category.filter((f)=> f.id(5)).withReferences().getSingle()
final user = category.references.user; // Returns actual User, even if not needed
final todos  = category.references.todos // A class to Stream/Await the Todos

I'm tempted to have withReferences return all by default. It's consistent to reverse references, which are included by default (albeit, as a Selectable, not an actual Dataclass). In addition, withReferences().withListings() does looks kinda redundant.

The drawback is that they are forced to all or none, which lowers performance, especially with watch, where changes to any referenced table will trigger a new event.

Option 2

final category =await  category.filter((f)=> f.id(5)).with().users().getSingle()
final user = category.references.user; // Returns actual User, but only if requested, inconsistent with `todos` which always returns
final todos  = category.references.todos // A class to Stream/Await the Todos

Only include the requested dataclasses, this is better in every other way, except with doesnt seem like it's descriptive enough $ todos and user act differently

  1. todos is always included
  2. todos is a selectable

Thoughts?

dickermoshe avatar May 07 '24 03:05 dickermoshe

I've spent a considerable amount of time on this, but I'll have to agree, in practice this is handing a loaded gun to a child.

Yeah, unfortunately this looks like a hard problem to get right, and since nobody likes to rewrite all their queries we can't really change the API substantially once we have released something.

Reverse references should be returned as a new Selectable that can be watched or awaited.

I agree with that API 👍 A downside would be that we then can't automatically run the 1+n statements in a transaction. But I'm fine with that, users will know that they're separate queries because they have to use a second await and so it's expected that the two queries aren't atomic unless manually wrapped in a transaction.

Regarding option one: How terrible do you think it is from a UX perspective if we just make references nullable by default and then have boolean flags to control what to include? E.g. withReferences(user: true)? And then user would be non-null for that query and developers would still have to call category.references.user!. It's not great, but I think we may have to make some references nullable either way since foreign keys aren't enforced by default (or it could be an optional foreign key). So maybe users will just have to know when it's safe to do a null assertion. Loading all foreign keys by default sounds bad if we have a larger chain of references between tables (e.g. todo -> category -> owning user -> ...).

Only include the requested dataclasses, this is better in every other way, except with doesnt seem like it's descriptive enough $ todos and user act differently

I'm fine with back- and forward-references acting differently.

And maybe that's something that's not ever going to come up, but I still want to ask whether we can make this possible. What if we have unbounded chains of references? E.g. something contrived like

class Employees extends Table {
  IntColumn get id => integer().autoIncrement()();
  IntColumn get supervisor => integer().references(Employees, #id)();
}

In SQL, I could write a query to find the supervisor two layers up in the hierarchy. I think it would be cool if we could have an API where that can be expressed, but it will likely look a lot different than the things we (and probably most other ORMs) have discussed. This would be possible if we had a way to generate code for only the joins that were requested, e.g.

abstract class Employees extends Table {
  IntColumn get id => integer().autoIncrement()();
  IntColumn get supervisor => integer().references(Employees, #id)();

  JoinQuery<({self: Employee, another: Employee})> get withGrandSupervisor => buildJoin(['self', 'another: self.supervisor.supervisor']);
}

I don't want to derail the discussion with these examples (and I'll shut up about them if it sounds like no one will ever need this), but I think that's an opportunity to not become a limitation (every join possible in SQL is possible here), not generate superfluous code (joins have to be requested), and be precise about which tables are included in the result. A downside is that this is harder to use of course, but perhaps we can ease this somewhat by adding syntax sugar on references? E.g.

abstract class Employees extends Table {
  IntColumn get id => integer().autoIncrement()();
  @GenerateJoin
  IntColumn get supervisor => integer().references(Employees, #id)();
}

simolus3 avatar May 07 '24 16:05 simolus3

I'm still thinking about how to proceed on this...

dickermoshe avatar May 12 '24 01:05 dickermoshe

It's not an easy API to design for sure. My "list all joins beforehand" idea also suffers from poor discoverability. I'm happy to help if as well if you have some rough ideas for an API and how the implementation could look like of course.

simolus3 avatar May 12 '24 19:05 simolus3

Let me walk the logic:

Requirements Getting references should be supported without any additional code being written. Reading any other information out of the query (Aggregates) are out of scope at this time, but we should try to do this in a way that will work with aggregates. Everything should be statically typed. Should be well optimized & powerful while still ensuring that developers don't create any performance issues

We should only prefetch what a user explicitly asks for. The withReferences().withUser() works well for this. We should still provide a async way to get the item even if it isn't explicitly requested.

todos.withReferences().withUser().getSingle().references.user // Returns User
todos.withReferences().getSingle().references.user // Returns Future<User> OR Selectable<User>

The above should work with back references too.

The only issue that we are faced with is streaming. Whether we support it, or we don't, we are faced with the same issue. Performance.

If we support the above for streaming, then it is possible that the user can create a stream that is tied to every table in the database.

todos.withReferences().withUser().stream() // Re-trigger on every write to `todos` or `users`

On the other hand, if we don't add this feature, users may do the following

final results = todos.stream.asyncMap((e) async => (todos: e, user: await  users.filter((f)=>f.id(e.user)).getSingleOrNull()))
results.first.todo // The 1st todo
results.first.user // & it's first user

This means that there is a stream for every object being returned, this is really bad. Also, there is no simple way for him to stream the user, unless he starts going into the weeds with RxDart with combineStream.

In a perfect world the user should only be streaming what he needs, in other words, pagination. Pagination is the solution for all of this, allow the user to read entire tables with references, but only stream in batches.

What such an API could look like, I'm still thinking about...

dickermoshe avatar May 12 '24 23:05 dickermoshe

Pagination would definitely be awesome to have, but I think it doesn't really solve the stream invalidation problem (every write to either table would still invalidate every page in every paginated stream, right?). We'd have to parse fewer rows into result classes, but ultimately not run fewer queries than before.

That's not a problem directly related to a join API in my opinion - manual joins also suffer from the same problem. So a pagination API should be orthogonal to a join API and work with or without it (users may also want to paginate simple reads without references to a single table).

simolus3 avatar May 14 '24 13:05 simolus3

@simolus3 What do you think of following?

Let's have all references be accessible by FutureOr. Using groups.withReferences(users: true) will prefetch the users and place it in a cache.

final group1 = await groups.withReferences(users: true).getSingle(); // Get first group, loads users into cache
final users = await group1.users; // Get's users from the cache, doesn't make another query

In previous examples, withReferences().withUsers() would do some typing magic to make group1.users synchronous, However, this increased complexity substantially.

There isn't really a practical benefit from a performance perspective between awaiting FutureOr<User> and User. The only issue is that Flutter has to wait a single frame between loading User. This could be a workaround for that https://stackoverflow.com/questions/73024738/can-i-use-a-futurebuilder-with-a-value-of-the-type-futureort

dickermoshe avatar May 28 '24 18:05 dickermoshe

Since you have to await the query either way I'm not sure it makes much of a difference in the end, most methods using the API will be async either way. So the FutureOr trick looks fine to me :+1:

simolus3 avatar May 28 '24 21:05 simolus3

The following is a happy medium between readability and code length. When requesting references, we return a Manager that already has the filters applied to it.

In practice the code looks like:

// Get user with id #1
final result = await users.filter((f)=>f.id(1)).withReferences().getSingle();
// The actual user
final user = result.user;
// The actual group
final group = await result.group.getSingle();
// Let's say we wanted to get all the users in this group
final allUsersInGroup = await result.group.withReferences().getSingle().users.get();
// Being that we are returning a manager, we can do more filtering:
final allUsersInGroupOver10 = await result.group.withReferences().getSingle().users.filter((f)=>f.age.isGreaterThan(10)).get();

Prefetching and Joining to save from n+1 can be added later backwards compatible.

The code for this is written, I'm updating the docs and tests now...

@simolus3 what do you think?

dickermoshe avatar Jun 16 '24 21:06 dickermoshe

@simolus3 So, what should we do about this?

test/generated/todos.g.dart:3763:48: Error: 
The argument type 'int' can't be assigned to the parameter type 'RowId'.
          .filter((f) => f.id(todosTable.category!));

This is stemming from the fact that in the schema we are using 2 different types for each end of relationship:

https://github.com/simolus3/drift/blob/9155b6856019b3d020032ae451a8082c47a48ed5/drift/test/generated/todos.dart#L12-L56

Can I just do todosTable.category as RowId or should I be doing RowId(todosTable.category)? Or should I be making todosTable.category a RowId in the actual schema?

Either way, allowing users to use 2 different types on each end of relationship sounds like a bad idea. (Even though the base type matches)

dickermoshe avatar Jun 16 '24 22:06 dickermoshe

Can I just do todosTable.category as RowId or should I be doing RowId(todosTable.category)?

If we want to allow this, we should be applying the type converter (categoriesTable.id.converter.fromSql(todosTable.category) probably).

simolus3 avatar Jun 17 '24 20:06 simolus3

Returning pre-populated managers that can then be fetched afterwards is brilliant! I definitely agree with that approach (and also with potentially adding prefetching/caching under the same interface later where it makes sense). Should I review the current state of the PR or did you want to push documentation/test changes first?

simolus3 avatar Jun 17 '24 20:06 simolus3

Nah, wait till I've got it all done. I'll ping you. Just wanted your thoughts on the design before I finished the PR.

Sidepoint, I started a new project today and was using the docs. It's time for a rewrite, over the past 5 years it's gotten fragmented.

Static Shock or MkDocs sound good?

dickermoshe avatar Jun 17 '24 20:06 dickermoshe

Alright, I'll wait :+1:

Static Shock or MkDocs sound good?

I agree that the content could certainly use a structured rewrite and the landing pages need some love, but I prefer the build_runner based workflow for it as we can get everything done in Dart without having to remember to run a series of commands or shell scripts that don't do proper invalidation. The site is doing some advanced things like package:analyzer based syntax highlighting that links to dartdocs of methods referenced in snippets, most static site generators don't support customization like that.

So if you have ideas on how the docs should look/feel like and be structured, I'd love to hear them because I'm not really good at managing the docs page. We can also use a cleaner template, I'd just like to keep the current generator if we don't have something that would be very painful to implement with it.

simolus3 avatar Jun 17 '24 21:06 simolus3

I plan on contributing to Static Shock (it's written in dart). I can add whatever is needed there.

I'll bring this up again once they add source code snippet injection.

dickermoshe avatar Jun 17 '24 23:06 dickermoshe

So if you have ideas on how the docs should look/feel like and be structured, I'd love to hear them because I'm not really good at managing the docs page.

Don't feel bad. I've never seen an ORM with good docs. There must be something fundamentally confusing about it.

I'm going to look around...

dickermoshe avatar Jun 17 '24 23:06 dickermoshe

Not to pat myself on the back, but I'm using this new API on a production project and It's really nice.

Thank you for being so kind and open to contributors. This has all been a really fun experience

dickermoshe avatar Jun 18 '24 02:06 dickermoshe

I took some time off working on this PR to do other things.

I took a look at it again after a week I'm glad I did. The code was pretty messy & a fresh look helped me notice that.

Writing much more tests, cleaning up the reference implementation and updating the docs now. Should hopefully be done sometime sunday.

dickermoshe avatar Jun 21 '24 22:06 dickermoshe

@simolus3

Handling different type converters is making the code really messy. Also, IMHO, I think it's bad practice to use different types for fields that reference each other. I'm ok with the manager generator ignoring those fields.

So this is technically a breaking change. If someone had references this way and is filtering on them using manager API, he would have to add a converter to the other side.

dickermoshe avatar Jun 23 '24 20:06 dickermoshe

I'm going to try experimenting with some sort of prefetching mechanism to this API before merging.

I want to make sure that it's possible with the current API

dickermoshe avatar Jun 24 '24 15:06 dickermoshe

FutureOr is also very annoying to work with. SynchronousFuture is also not recommended. I don't want to have any loading screen when items are prefetched.

https://github.com/dart-lang/sdk/issues/50204 https://github.com/dart-lang/language/issues/3625

Seems that remi has been dealing with this for a while. I'll speak to him.

dickermoshe avatar Jun 24 '24 15:06 dickermoshe

@simolus3 This is becoming a bigger and bigger PR. There are a few breaking changes too. I don't think it will be easy to review.

I'm going to create 4 new PRs

  1. Refactor Manager Code
  2. Bug Fixes
  3. Breaking Changes
  4. New Reference Feature

How does this sound?

dickermoshe avatar Jun 25 '24 18:06 dickermoshe

That sounds good to me, thanks for taking the time to make the changes easier to review :+1:

As for the breaking changes, I'd like to coordinate them with other breaking changes lined up (I have a breaking label on issues for that), it's also a good opportunity to clean up the API somewhat. I try to keep things stable where possible, from experience the major version bumps are always painful and it takes a long time for users to upgrade. We'll likely need a drift 3.0 when introducing macros, so depending on how much the breaking changes are necessary we can either do them then or perhaps make them opt-in with a builder option if possible.

I don't want to have any loading screen when items are prefetched.

IMO if it's either asynchronous or not and we want to do everything with the same classes, FutureOr is still the best type. Casting is annoying, but what better option is there is we truly don't know from the type whether data is there or needs to be loaded.

simolus3 avatar Jun 25 '24 20:06 simolus3

The changes that would be breaking aren't drastic. They could wait for a 3.0.

I didn't realize that you plan on making a migration to macros. I'm super excited. I'm going to learn them!

dickermoshe avatar Jun 26 '24 03:06 dickermoshe