mikro-orm Populating *-to-many relationship partially

The Smart Nested Populate feature is imo the holy grail of ORMs and I'm so thrilled to be finally able to work with an ORM that has it! Back when I was working on PHP projects with Doctrine the first thing I always did was implement a set of wrappers to provide this functionality and I couldn't be happier to find that with Mikro ORM I don't need to.

I've a question though: when used to load a *-to-many relationship, the SNP feature will always load the whole collection of related entities. This might be undesirable in some situations: I might sometimes only need some of them. For example I might want to load the 10 latest articles and for each article the top 3 comments. This can be implemented on the SQL level using window functions, at least in PostgreSQL. I understand that implementing this within MikroORM would be a huge, if not downright impossible, undertaking, considering the need for a clean and generic API - but writing the query isn't the hard part - the hard part is mapping the entities loaded by the query back onto the collections inside the original entities - and that logic must already be implemented within MikroORM, because it's what the Smart Nested Populate feature must be using under the hood.

So my question is this: would it be possible to expose this functionality to user code, so that we can implement these sorts of features if and when needed without having to reimplement the mapping part? Unless of course loading a collection partially might break something within the identity map or UoW..?

Jun 02 '21 09:06 jahudka

I second this, it would come in very handy if we could do something like this:

em.find(Entity, { populate: { nestedRelation: { regularProperty: true } } });

So it would only populate the entities that match that condition. Kind of like the join condition in the querybuilder

Jun 13 '21 12:06 sebastiaanviaene

@sebastiaanviaene I think you can already do this..? Maybe not via the populate argument, but the em.populate() method accepts a FilterQuery as its third argument (basically a WHERE clause, same thing you'd pass as the second argument to em.find()).

What I want to be able to do is to load a bunch of related entities using a custom SQL query via em.getConnection().execute() & possibly em.map() and have the ORM do the heavy lifting of assigning them to the properties / collections of the original entities correctly.

Using the Author and Book entities as an example:

// For a given set of original entities...
const bestAuthors = await em.find(Author, {}, { orderBy: { lastQuarterSales: 'desc' }, limit: 5 });

// I want to load a relation using arbitrary SQL...
const rawLastBooks = await em.getConnection().execute(`
  WITH "sorted_books" AS (
    SELECT "books".*, ROW_NUMBER() OVER (PARTITION BY "author_id" ORDER BY "date_published" DESC) AS "idx"
    FROM "books"
    WHERE "id" IN (?)
  )
  SELECT * FROM "sorted_books" WHERE "idx" < 3
  ORDER BY "idx"
`, [bestAuthors]);

// Maybe converting the raw data to entities myself, but the ORM might be happier
// receiving the raw data in the next step, not sure..
const lastBooksWithAuthorId: [number, Book][] = rawLastBooks.map((data) => [data.author_id, em.map(Book, data)]);

// And now I want to:
em.doMagic(lastBooksWithAuthorId); // or even passing rawLastBooks directly

// And have:
for (const author of bestAuthors) {
  author.books.length <= 3;
}

Jun 13 '21 12:06 jahudka

So it would only populate the entities that match that condition. Kind of like the join condition in the querybuilder

That is exactly how it works now. The condition is propagated automatically to child entities.

Maybe not via the populate argument, but the em.populate() method accepts a FilterQuery as its third argument (basically a WHERE clause, same thing you'd pass as the second argument to em.find()).

Actually the way to do it is via the populate argument, there it works automatically. If you go with dedicated em.populate() call, then you need to pass the child condition yourself. Using the dedicated call is actually the way to not do it - e.g. if you want to limit the root entity based on child properties, but want to load full collections of them regardless of the condition.

Worth noting there is also Collection.matching() that allows partial loading of a collection.

Using the Author and Book entities as an example:

Sounds like you want just this? Not sure if we should have some magical API for this.

const booksByAuthor = new Map<number, Book[]>();

for (const data of rawLastBooks) {
  const items = booksByAuthor.get(data.author_id) ?? [];
  items.push(em.map(Book, data));
  booksByAuthor.set(data.author_id, items);
}

for (const author of bestAuthors) {
  const items = booksByAuthor.get(author.id) ?? [];
  author.books.hydrate(items);
}

Jun 14 '21 08:06 B4nan

Actually the way to do it is via the populate argument, there it works automatically. If you go with dedicated em.populate() call, then you need to pass the child condition yourself. Using the dedicated call is actually the way to not do it - e.g. if you want to limit the root entity based on child properties, but want to load full collections of them regardless of the condition.

I'm not sure I understand this.. So these are not the same?

em.getRepository(Author).findAll({ populate: { book: { title: { $like: 'Life of %' } } } });

// vs.

const authors = await em.getRepository(Author).findAll();
await em.populate(authors, 'book', { book: { title: { $like: 'Life of %' } } });

Worth noting there is also Collection.matching() that allows partial loading of a collection.

Yeah, but that will only load one collection at a time, so N+1 queries, right?

Sounds like you want just this? Not sure if we should have some magical API for this.

Yeah, that's exactly what I want - your example works when I want to do this in one place for one specific relationship, but if I wanted to have a generic solution I'd have to roll my own - one which would analyse the relationship metadata to figure out which side (if any) is collection-valued ("many"), what the foreign keys are called etc., then build a map of one or both sides of the relationship and then stitch things together. MikroORM must already have this implemented internally in order for smart nested populate to work. I'm just asking that the existing functionality be exposed in a way that allows me to do more magic with less work ;-)

Jun 14 '21 15:06 jahudka

@jahudka

I think you can already do this..? Maybe not via the populate argument, but the em.populate() method accepts a FilterQuery as its third argument (basically a WHERE clause, same thing you'd pass as the second argument to em.find()).

This works indeed, but getting it to work with the populate argument is something I have not figured out yet.

working example:

await em.populate(organizations, { memberships: { user: true } }, { memberships: { role: Role.admin } });

@B4nan

That is exactly how it works now. The condition is propagated automatically to child entities.

if I try this for example:

await em.findAll(Organization, { populate: { memberships: { user: true, role: Role.admin } } });

I will get the following error

ValidationError: Entity 'Membership' does not have property 'role'

It only seems to work for 1..x relations

Jun 14 '21 16:06 sebastiaanviaene

So these are not the same?

Nope, the first one is not a valid query, you are mixing populate with where condition, those are separate things.

What I am referring to is this:

await em.find(Author, { books: { title: { $like: 'Life of %' } } }, { populate: ['books'] });

This fires 2 queries, first to load all authors, with where condition checking their attached books for their titles second one to load the books entities, again using the condition to check their titles, so we end up with partially loaded collections - only with items that match the title. The condition is propagated down to the second query that populates books collections (loads book entities).

Yeah, that's exactly what I want - your example works when I want to do this in one place for one specific relationship, but if I wanted to have a generic solution I'd have to roll my own - one which would analyse the relationship metadata to figure out which side (if any) is collection-valued ("many"), what the foreign keys are called etc., then build a map of one or both sides of the relationship and then stitch things together. MikroORM must already have this implemented internally in order for smart nested populate to work. I'm just asking that the existing functionality be exposed in a way that allows me to do more magic with less work ;-)

I am not aware of any internal method that would do this, partially its done in Driver.loadFromPivotTable, but that is only for m:n relations that use pivot table.

I am still not convinced we need more than what I gave you in that example. You will need to handle the grouping yourself, as that is dependent on the query, also there can be composite keys. The rest is just about calling Collection.hydrate() with the right array.

Btw this is already handled in the em.map() as well as em.create():

const author = em.map(Author, { name: '...', books: [{ title: 't1' }] });`

But it always work on a single entity. Check how the em.populate() is implemented (EntityLoader class), it works mainly thanks to the identity map, as we can just load all references for given relation and don't care about mapping.

Maybe one of the private methods of EntityLoader is what you are up? If so, we can make some of them public for sure.

It only seems to work for 1..x relations

Populate is only for releations, so m:1/1:1/1:m/m:n only, not regular properties. Population != partial loading.

https://mikro-orm.io/docs/entity-manager/#fetching-partial-entities

Jun 18 '21 18:06 B4nan

Nope, the first one is not a valid query, you are mixing populate with where condition, those are separate things.

Actually I don't think I am - findAll(), according to this, accepts either a Populate object or a FindOptions object as its first argument, and FindOptions allows a populate property to be specified.

I think there's a misunderstanding about what I'm trying to do.. Let me try and clear it up:

Say I have two entities: Author and Book, and there's a 1:M relationship between them. I somehow obtain an array of Author entities - it's irrelevant how I got them - let's just assume that I have a const authors: Author[] = magic();. If I now call em.populate(authors, 'books');, MikroORM must do two things:

Query the books table for books belonging to the specified authors.
Map the result of the query to Book entities and hydrate the books collections of the Author entities.

The second step is basically what you're describing in the code snippet at the end of your previous comment - but em.populate() must already be able to do this internally for any relationship based on the relationship metadata, even taking into account composite keys and whatnot. Maybe it happens automatically courtesy of the identity map, I'm not sure. But even though em.populate() and FindOptions.populate are pretty powerful, there are some things they can't do - some queries they can't compose - like using ROW_NUMBER() OVER (PARTITION BY author_id ORDER BY publish_date DESC) to get the last 0-3 published books per author. That's fine, I can write & execute the query myself (thus completing step 1), but then I still need to map the raw database rows to entities and then hydrate the relations of the books collections on the original Author entities. So essentially what I'm looking for is:

const authors: Author[] = somehowGetAnArrayOfSomeAuthors();
const books: Book[] = somehowGetSomeBooksBelongingToTheseAuthors(authors);

em.doMagic(authors, books);

for (const author of authors) {
  for (const book of author.books) {
    // should be hydrated
  }
}

Jun 18 '21 19:06 jahudka

Looking at the EntityLoader you mentioned makes me think the initialiseCollections() method might be the ticket

Jun 18 '21 19:06 jahudka

Actually I don't think I am - findAll(), according to this, accepts either a Populate object or a FindOptions object as its first argument, and FindOptions allows a populate property to be specified.

I am talking about what you put inside the populate property, and that is what you have wrong. It might work, but it won't do what you think. Truthy values will work just like passing true. But it will not have any effect on the where query. Both lines do the same thing:

em.getRepository(Author).findAll({ populate: { book: { title: { $like: 'Life of %' } } } });
em.getRepository(Author).findAll({ populate: { book: { title: true } } });

Btw I just tried it, and the first line won't even typecheck, at least not in strict mode (or are you using JS instead of TS?). This is simply not supported, never was, it is also not documented anywhere like that. You are using findAll and that implicitly has no where condition, that is what the method is for.

As I said above, if you want to have condition, correct query looks like this (using find, not findAll):

await em.find(Author, { books: { title: { $like: 'Life of %' } } }, { populate: ['books'] });
// or with populate as object
await em.find(Author, { books: { title: { $like: 'Life of %' } } }, { populate: { books: true } });

Jun 18 '21 19:06 B4nan

I am talking about what you put inside the populate property

Yeah, me too! When I wrote:

em.getRepository(Author).findAll({ populate: { book: { title: { $like: 'Life of %' } } } });

The intended outcome was that all authors would be loaded, and for each of them, all their books with a title matching the specified condition - so findAll authors, but populate only some of their books. Anyway, I thought that's how the populate option works, so thanks for clearing that up, but I think we're getting a little off-topic :-) forget about conditions or findAll() vs. find() - the gist of my question is: having an array of root entities and an array of raw data for a relation of the root entities, how do I go about hydrating the relation of the root entities using the raw data, in a single line of code / single call to a method of the em, and in a way that is generic, i.e. doesn't require me to explicitly specify things like what the foreign key column is called, when these things can be resolved from the entities' metadata?

Jun 18 '21 20:06 jahudka

The intended outcome was that all authors would be loaded, and for each of them, all their books with a title matching the specified condition - so findAll authors, but populate only some of their books

Again, that is not how it works, it never worked this way, and is neither documented, nor tested. Populate parameter only tells what relations to populate, it does not have effect on the where query (other than the in (list of PKs) part).

having an array of root entities and an array of raw data for a relation of the root entities, how do I go about hydrating the relation of the root entities using the raw data, in a single line of code / single call to a method of the em, and in a way that is generic, i.e. doesn't require me to explicitly specify things like what the foreign key column is called, when these things can be resolved from the entities' metadata?

I don't think it has to be a single method, and I don't think it should be on the EM level. For me it is really the hydrate method of collection. You will always be responsible for grouping, which is the first part of my snippet, the second part is just calling hydrate on the collections. For me this is really deadly simple and adding more abstractions won't help much.

If we were to add such method, it would not work with PKs like you want, it would work rather with entities, so you would still have to do the mapping yourself. We can't work with PKs like that as there are composite PKs too, or custom types, etc. We need scalars and PK might not be scalar.

The only one-liner I can think of is em.create() and em.map() which basically does what you want, but for the whole entity graph (create the author entity together with it's collection).

So let's go the other way. You keep telling me to expose a method you want, and are convinced there is one. I keep saying there is no such method, definitely not the one you are imagining. So please describe what you exactly want, propose the public API (including the method name). And think of how it would work with complex cases as with composite keys, FK as PK, PK with custom type (e.g. object value, or a buffer). For me the snippet I gave you is what you should do. It can be easily abstracted to work with entity metadata. We can have that somewhere, maybe on the EntityLoader, but this does not sounds like a good candicate for EM api.

Jun 18 '21 20:06 B4nan

If it's not a single method then the call site needs to know which method to call for a given relationship, which means the call site either needs to be hard-coded, or it needs to inspect the relation metadata to be able to decide automatically, which I believe can also be done behind the scenes by the ORM - but I may be missing something here especially in relation to noSQL databases because I have no experience with them. But to try to answer your question:

I'd like to have a method with the following signature:

em.hydrateRelation<T extends AnyEntity>(entities: T[], relation: keyof T, data: EntityData[]): void;

// usage:
em.hydrateRelation(books, 'author', resultOf('SELECT * FROM author WHERE whatever'));

Whether it's part of the EntityManager API or some other public part of the ORM isn't that important to me, you know best where - if at all - something like this should belong. But what it should do is pretty straightforward, even if the actual code would be complex:

Figure out the metadata behind the operation - ie. get entity metadata for entities, from there get the relation metadata for the specified relation and based on that get the metadata of the target entity of the relation.
This should tell us things like what shape should we expect data to be in, which side(s) of the relation are collection-valued, and so on - so e.g. "Data should be { id: number, name: string, birthday: Date }[], the left-hand side of the relation is not a collection (book.author) and the right hand side is a collection (author.books)".
Based on that we should* be able to create a map of entities on both sides of the relation, and then we only need to stitch the two maps together (iterate over one and use the entries to hydrate values in the other - basically what the second part of your snippet does).

*should, but that's only based on the following assumptions:

In order for the ORM to be able to provide an Identity Map, it must be possible for the ORM to extract a unique entity identifier from the entity raw data, whether this key is scalar or not. The Identity Map must map something to the entities, after all.
Similarly, in order for 1:1 and 1:M relations to work with the Identity Map, it must also be possible for the ORM to extract a unique entity identifier for the inverse side of a relationship from the owning side raw data - e.g. if the ORM is hydrating Book entities it must understand that the author_id property of the raw data should be used to obtain an appropriate unique Author identifier in order to get an Author instance / reference from the Identity Map. Maybe this doesn't happen automatically every time entities with owning sides of relations are loaded, but at least it must happen during populate().
Something along the same lines must also be true for M:N relations, except maybe more complicated when pivot tables are involved.

Does that make sense? Is any of my assumptions wrong? If so, how does it actually work?

Jun 18 '21 22:06 jahudka