drogon icon indicating copy to clipboard operation
drogon copied to clipboard

Eager Loding/Relationship Caching

Open Gab-Menezes opened this issue 2 years ago • 10 comments

Is your feature request related to a problem? Please describe. The raltionships are not cached by the ORM, which leads to unecessary DB access. Also not having a eager loading system can cause major performance issues.

Describe the solution you'd like Add in the generated model file a way to eager load and use the cached relationship (this second part can be controlled by the model.json file for each relationship)

Describe alternatives you've considered I was think something similar to Eloquent from Laravel:

$user = User::query()
->with('posts', function(Builder $query) {
    $query->with('comments')
        ->with('banner', function(Builder $query) {
            $query->with('metadata');
        });
})
->get();

This will run 5 queries, one for users, one for posts, one for the comments, one for the banner and one for metadata. But since I can't think a way to load the relationship only using the name as a string a thought in generate some static methods in the model class for each relationship so it can be eager loaded. Instead of passing a Query Builder to the anonymous function

DbClientPtr client = drogon::app().getDbClient();
Mapper<Users> mp(client);
std::vector<Users> users = mp.findAll()
Users::loadPosts(users, [](std::vector<Posts>& posts) {
    Posts::loadComments(posts);
    Posts::loadBanner(posts, [](std::vector<Banners>& banners) {
        Banners::loadMetadata(banners);
    })
});

The type of the static function would be void(std::vector<T>& model) (It can't be const std::vector<T>& because the entities inside the vector need to be modified). As you can see the callback is optional, because maybe we don't want to load nothing after that relationship. The eager loading functions only accepts a vector because N+1 problems don't occur when you have a single model and want to retrive it's relationship. The generated sql for the eager loading would be something like this:

select * from posts where user_id IN ($1, $2, ...);
select * from comments where post_id IN ($1, $2, ...);
select * from banners where post_id IN ($1, $2, ...);
select * from meta_data where banner_id IN ($1, $2, ...);

After this I would loop over the relationships and distribute then for the correct parent. Everything would be stored in a new member variable inside each model (std::shared_ptr<T>). For example:

  1. Users model would have std::vector<Posts>
  2. Posts model would have std::vector<Comments>, Banner and User (inverse relationship);
  3. Banners model would have Metadata
  4. Metadata model would have Banner (inverse relationship)

OBS: The inverse relationships would not be loaded since would be a nightmare to manage the references

Now for retrive this relationships would be something like:

user.getPosts(client, 
    [](std::vector<Posts>& posts) {},
    [](const DrogonDbException& e) {}); //this would still run the query and after it sets the posts member inside user
//It's passed by reference because it's a reference to the posts member inside user

std::vector<Posts>& posts = user.getPosts(); //this would only retrive from the member inside user

Not determined

  1. For optional relationships I was thinking in using std::optional<T> as a return type, but this can also be and exception (because std::optional is not widly used in the framework). If this is the case the optionality of the relationship would be configured in the model.json (this is more a problem for has one/belongs to relationships, because for has many we can just return a empty vector).
  2. I was also thinking about change the sql in the relationship functions to use the Mapper<T> functions, opnions are also welcome.
  3. I'm still thinking about many to many relationships
  4. Having a std::future<T> as a way to retrive the relationship, similar to:
Mapper<Users> mp(client);
std::future<Users> user = mp.findFutureByPrimaryKey(id);

It would be:

Mapper<Users> mp(client);
Users user = mp.findByPrimaryKey(id);
std::future<std::vector<Posts>&> posts = user.getFuturePosts(client); 
//this case would be similar to the new way of retriving and updating the cache but it would result in more redable code
//and would throw an exception if something goes wrong

Gab-Menezes avatar Sep 21 '21 20:09 Gab-Menezes

@an-tao as I said in https://github.com/drogonframework/drogon/issues/1023#issuecomment-924002889. What do you think ?

Gab-Menezes avatar Sep 21 '21 20:09 Gab-Menezes

@Gab-Menezes Thanks so much for your suggestion, this is a big requirement of ORM, We should fully discuss before we start to implement it. your approach looks good to me, now I have some questions about it:

  1. The nesting of callbacks increases the complexity of programming. For example, how do I know that all queries have been executed? In your example, It is uncertain which of loadBanners and loadComents will be executed first. (I think we could add some coroutine interfaces to resolve this).
  2. How about adding sub-relationship loading to the eager loading method, (for example automatically load posts in the Users::loadPosts method), and add a enableEagerLoading interface to the Map tempalte.

@rbugajewski @marty1885 @hwc0919 What do u think about this subject?

an-tao avatar Sep 22 '21 12:09 an-tao

I use the ORM mostly for very simple queries so I don’t have any particular opinion about this subject.

rbugajewski avatar Sep 22 '21 17:09 rbugajewski

@an-tao I was testing somethings and I think I found a bug. As I suggested we could change the SQL to the Mapper<T> in the relationships functions. But when I do:

DbClientPtr client = drogon::app().getDbClient();
Mapper<Posts> mp(client);
auto posts = mp.findBy(Criteria(Posts::Cols::_user_id, CompareOperator::EQ, 1));

The program crashes: image Here is the file and line

But when I change from int to string everything works

DbClientPtr client = drogon::app().getDbClient();
Mapper<Posts> mp(client);
auto posts = mp.findBy(Criteria(Posts::Cols::_user_id, CompareOperator::EQ, "1"));

In the wiki we have a example exactly like that.

Do you know the reason ?

Gab-Menezes avatar Sep 23 '21 02:09 Gab-Menezes

what's the type of the user_id column of the Posts table?

an-tao avatar Sep 23 '21 02:09 an-tao

Metadata: image Member variable: image DB table: image

Gab-Menezes avatar Sep 23 '21 20:09 Gab-Menezes

looks like a bug. I'll figure it out.

an-tao avatar Sep 24 '21 01:09 an-tao

So I was thinking and I reached some conclusions (this is a early draft so don't bother with well written things): image image As you can see I changed the void getPosts (default generated by framework) to use Mapper<Posts> instead a raw SQL. (That's the reason I asked about the findBy method)

  1. The asynchronous methods void getPosts and std::future<std::vector<Posts>> getFuturePosts have odd behaviour. Since we don't know wen they will finish the execution the user variable that called it may be undefined (pop out of the stack) (especially for the callback overload). So we can set the posts member inside the user that called it (if the user wants he need to pass user model inside the callback function as a reference and set the cache manually using a setPosts method).
  2. I made a synchronou getPosts (std::vector<Posts>& getPosts(const DbClientPtr& clientPtr)) function, that set the cache inside the user model (this one can alse be used to reload the cache).
  3. And the the cache version only gets from the cache if nothing is setted throws an exception.
  4. As you can see I added a TODO in the callback version because I noticed that it uses a push_back(T(row)) and we can change to emplace_back(row) right ?

I didn't modified the source yet, just changed the generated model file for a draft. Also I was looking inside the source of the findBy function and the CompareOperator::In isn't implemented yet ? If it is, can you send me a example on how to use it (I don't know if I need to pass a string separeted by comma, or a vector and etc...).

And I was also thinking that de loadPosts will need the clientPtr as a parameter (I didn't made the draft of this one because I'm confused about the CompareOperator::In).

Gab-Menezes avatar Sep 24 '21 19:09 Gab-Menezes

You can use CompareOperator::In, it should work.

an-tao avatar Sep 25 '21 01:09 an-tao

Hello @an-tao sorry for the wait. I was really busy in the last months and so wasn't possible to continue this issue. But now I'm back \0/.

I already made some progress but I would like your opnions in somethings (and I would like to give you a round o applause because the coroutine implementation in drogon is enginious specially the MapperAwaiter, I was blown away. Props to you!!!)

  1. The Async methods to recover the relationship (the callback version already exists, I just changed to use Mapper) (the future version, discussed 4.) can't set up the internal cache because the callback may outlive the current object instance. Wrong:
void Posts::getUser(const DbClientPtr& clientPtr,
    const std::function<void(Users)>& rcb,
    const ExceptionCallback& ecb)
{
    Mapper<Users> mp(clientPtr);
    mp.findBy(Criteria(Users::Cols::_id, CompareOperator::EQ, *userId_),
        [rcb, ecb, this](const std::vector<Users>& r) {
            if (r.empty())
            {
                ecb(UnexpectedRows("0 rows found"));
            }
            else if (r.size() > 1)
            {
                ecb(UnexpectedRows("Found more than one row"));
            }
            else
            {
                this->setUser(r[0]); //May cause undefined behaviour
            	rcb(r[0]);
            }
        },
        ecb);
}

Right:

void Posts::getUser(const DbClientPtr &clientPtr,
                    const std::function<void(Users)> &rcb,
                    const ExceptionCallback &ecb) const
{
    Mapper<Users> mp(clientPtr);
    mp.findBy(Criteria(Users::Cols::_id, CompareOperator::EQ, *userId_),
    [rcb, ecb](const std::vector<Users>& r) {
        if (r.empty())
        {
            ecb(UnexpectedRows("0 rows found"));
        }
        else if (r.size() > 1)
        {
            ecb(UnexpectedRows("Found more than one row"));
        }
        else
        {
            rcb(r[0]);
        }
    },
    ecb);
}

So in this case the callback (implemented by the user) should look something like this if he wants to the cache to be set

post.getUser(
	client,
	[post](const Users& user) mutable
	{
		post.setUser(user);
		...
	},
	[](const DrogonDbException& e)
	{

	}
);

Any thoughts on this ?

  1. The coroutine version to recover the relationship would look like this
drogon::Task<std::shared_ptr<Users>> Posts::getCoroUser(const drogon::orm::DbClientPtr& clientPtr)
{
    CoroMapper<Users> mp(clientPtr);
    auto user = co_await mp.findBy(Criteria(Users::Cols::_id, CompareOperator::EQ, *userId_));
    if (user.empty())
        throw UnexpectedRows("0 rows found");
	else if (user.size() > 1)
        throw UnexpectedRows("Found more than one row");

    setUser(user[0]);
    co_return r_user_;
}

It's not possible to return Users& because Task::promise_type::return_value() gets confused because of the overloadings, so we could return void (this would force the user to do .getUser() after it to get the cache, which is not cool), returning a copy to Users would be foolish because we want the user be able to modify the relationship and be reflected in the Posts class. So we return the shared_ptr. This would be a little bit confunsing if I was the user (because the cache doesn't return a shared_ptr), and probably get annoyed. So any ideias ?

  1. The sync version of the eager loading function looks like this Has many case:
void Posts::loadComments(const DbClientPtr &clientPtr, std::vector<Posts>& targets, const std::function<void(std::vector<Comments>&)>& rcb)
{
    std::unordered_map<int64_t, Posts*> map;

    map.reserve(targets.size());

    std::vector<int64_t> originalKeys;
    originalKeys.reserve(targets.size());

    for (auto& target : targets)
    {
        auto originalKey = target.getValueOfId();
        map[originalKey] = &target;
        originalKeys.push_back(originalKey);
    }

    Mapper<Comments> mp(clientPtr);
    auto relations = mp.orderBy(Comments::Cols::_post_id)
                            .findBy(
                                Criteria(
                                Comments::Cols::_post_id,
                                CompareOperator::In, 
                                originalKeys)
                            );
    rcb(relations);

    if (relations.empty()) return;
    auto begin = relations.begin();
    for (auto it = relations.begin(); it != relations.end(); it++)
    {
        const auto relationVal = begin->getValueOfPostId();
        if (relationVal != it->getValueOfPostId())
        {
            const auto end = it - 1;
            std::vector temp_vec(
                std::make_move_iterator(begin), 
                std::make_move_iterator(end)
            );
            map.at(relationVal)->setComments(std::move(temp_vec));
            begin = it;
        }
    }

    //last iteration
    const auto relationVal = begin->getValueOfPostId();
    std::vector temp_vec(
        std::make_move_iterator(begin), 
        std::make_move_iterator(relations.end())
    );
    map.at(relationVal)->setComments(std::move(temp_vec));
}

Has one case:

void Posts::loadUser(const DbClientPtr &clientPtr, std::vector<Posts>& targets, const std::function<void(std::vector<Users>&)>& rcb)
{
    std::unordered_map<int64_t, std::vector<Posts*>> map;

    map.reserve(targets.size());

    std::vector<int64_t> originalKeys;
    originalKeys.reserve(targets.size());

    for (auto& target : targets)
    {
        auto originalKey = target.getValueOfUserId();
        map[originalKey].push_back(&target);
        originalKeys.push_back(originalKey);
    }

    Mapper<Users> mp(clientPtr);
    auto relations = mp.orderBy(Users::Cols::_id)
                            .findBy(
                                Criteria(
                                Users::Cols::_id,
                                CompareOperator::In, 
                                originalKeys)
                            );
    rcb(relations);

    if (relations.empty()) return;
    for (auto it = relations.begin(); it != relations.end(); it++)
    {
        auto parents = map.at(it->getValueOfId());
        for (auto parent : parents)
        {
            parent->setUser(*it);
        }
    }

}

The callback function to load more things after is optional The coroutine version would be similar just change de Mapper to CoroMapper, co_await the findBy, remove the callback and return a Task<> I can't think of a way to implement the future version, because of 2 things:

  • Being async it may suffer from the same thing as 1. (it may outlive the current instances and trigger UB) (so we do the same thing and don't set the internal cache)
  • I don't know where to await/get the future and continue the processing after it. I thought in to this in a event loop and use a promise to write when finish and return a future. But it can't be in the current thread event loop because would cause deadlock since the promise would be in the next job, and the future in the current. One possibility was to use the main event loop, but this is an obvious bad ideia since it would clog the main loop. And my last ideia was to use the same loop as the database connection uses, since is something related to it, but I don't have access to it in "user land". We could create another event loop for this but it may be overkill and involve a lot of changes to the internals. Any ideias ?
  1. The future version to recover the relationship would suffer from the samething as the future version of the eager loading.

  2. The cache currently is get[Relationship name/Alias], but the framework uses this to return the shared_ptr and getValueOf to return the value, should I change ?

I already implemented in drogon_ctl the right version of 1, the Has many/Has one sync version of 3, and 5 I would like to hear your thoughts before continuing.

Gab-Menezes avatar Feb 05 '22 06:02 Gab-Menezes