cstore_fdw
cstore_fdw copied to clipboard
Implement fast skiplist-based aggregation
This is a major enhancement and will take some time to develop.
Currently cstore just returns individual tuples and relies on PostgreSQL for aggegation. This means we're not getting one of the primary benefits of column stores, which is fast single-column aggregation, based on the skiplist indexes.
This would probably involve implementing an executor hook. It will be complicated to figure out how to use the hook when we can (i.e. single-column known aggregation) and to kick it out to Postgres when we can't (e.g. multicolumn aggregation, custom aggregates, non-btree aggregation, etc.). Ideally, we would also add an API for supporting custom aggregates which can be based on COUNT/MIN/MAX from the skiplists.
I'm pretty new to PostgreSQL internals and I was wondering if this will be impacted or improved by the proposed Custom Plan API in 9.5? A link to the relatively lengthy threaded discussion on -hackers here:
http://postgresql.1045698.n5.nabble.com/v9-5-Custom-Plan-API-tt5802851.html#none
@besquared thanks for the link. No plans to support this in the 1st quarter of 2015. I think after we release v1.4, improving the speed of aggregation will be in our top priorities for the rest of 2015.
This would be amazing, and would be the main feature that would allow using postgres as an industrial historian (GE, Osisoft, Siemens have products that cost big $$$$$$$s). If you can "select date_trunc('month',date) as m_date, avg(sensor5_value) from my_sensor_table group by m_date;" and know internally the query is just going to be calculating the avg from the block index aggregations(Skip list) and not stupidly summing and counting all the actual rows, and therefore be super fast, you can do away with a whole stack of products and temp tables you see littered across industrial databases.
Thanks @philliproso for the insight. This feature seems to have a high demand, which moves it higher in our priorities.
I think I'll do some research about what is the best way to do this in the next month, then we can plan this.
Hello @pykello ,.. If there has been a further development on this can you leave a link here where I can learn more? We are definitely interested in single column aggregation from cstore_fdw to win the performance boost in our lighting-strike climate data. Thanks. - shawn
+1