pgsync icon indicating copy to clipboard operation
pgsync copied to clipboard

Support COUNT for children

Open rutkowskib opened this issue 3 years ago • 3 comments

PGSync version: 2.3.1

Problem Description:

I need to have a column created based on COUNT query of related table. I have 2 tables - users and users_followers. In users table there is info about users, and users_followers table stores who is followed by whom. They look something like this:

Users 
userId: uuid 
username: varchar
Users_followers
userId: uuid(users FK)
followerId: uuid(users FK)

I need to have information in ES about number of followers(I don't need information who it is or anything like that). It would look something like this:

userId: string
username: string
followers: integer 

I could achieve this probably using children in schema and then transforming data in plugin, but support out of the box would also be nice.

Thank you for your work on this project, and all help will be much appreciated.

rutkowskib avatar Jul 14 '22 14:07 rutkowskib

I could also use this. I was thinking of just going ahead and indexing the actual followers and querying by the count in elastic search but that would be wasting some elastic search storage and probably has some overhead. Also I don't know if elastic search supports aggregate logic in queries

Looking at the source for how concat is implemented, I wonder if it might be not such a stretch to implement count https://github.com/toluaina/pgsync/blob/209181eb19ad9c99adf807907d8842d9e06cbb4c/pgsync/transform.py#L57

candidia avatar Jul 17 '22 03:07 candidia

I don't think this is correct place to implement what I want. If implemented as a transform it would still need fetching joined table to pgsync. I have already done that using a plugin(code below).

from pgsync import plugin

class FollowersPlugin(plugin.Plugin):
    name = 'Followers'

    def transform(self, doc, **kwargs):
        print(doc)
        if hasattr(doc['users_followers'], "__len__"):
            doc['followers'] = len(doc['users_followers'])
        else:
            doc['followers'] = 0
        del doc['users_followers']

        return doc

I would rather to do it on db side, because number of followers can grow very large, and then it would be very expensive to do join and fetch data to pgsync.

rutkowskib avatar Jul 18 '22 09:07 rutkowskib

  • I've not found an easy way to implement aggregates.
  • COUNT is probably the easiest but there are others which are not easy to generalize.
  • You can do this in the plugin by accessing the doc[_"meta"] field which stores the id of each child table
  • So you can use something like this in the plugin code above
    • doc['followers'] = len(doc['_meta']['users_followers']['id'])

toluaina avatar Jul 18 '22 12:07 toluaina