cstore_fdw icon indicating copy to clipboard operation
cstore_fdw copied to clipboard

Implement fast skiplist-based aggregation

Open jberkus opened this issue 11 years ago • 6 comments
trafficstars

This is a major enhancement and will take some time to develop.

Currently cstore just returns individual tuples and relies on PostgreSQL for aggegation. This means we're not getting one of the primary benefits of column stores, which is fast single-column aggregation, based on the skiplist indexes.

This would probably involve implementing an executor hook. It will be complicated to figure out how to use the hook when we can (i.e. single-column known aggregation) and to kick it out to Postgres when we can't (e.g. multicolumn aggregation, custom aggregates, non-btree aggregation, etc.). Ideally, we would also add an API for supporting custom aggregates which can be based on COUNT/MIN/MAX from the skiplists.

jberkus avatar May 05 '14 22:05 jberkus

I'm pretty new to PostgreSQL internals and I was wondering if this will be impacted or improved by the proposed Custom Plan API in 9.5? A link to the relatively lengthy threaded discussion on -hackers here:

http://postgresql.1045698.n5.nabble.com/v9-5-Custom-Plan-API-tt5802851.html#none

besquared avatar Jul 23 '14 19:07 besquared

@besquared thanks for the link. No plans to support this in the 1st quarter of 2015. I think after we release v1.4, improving the speed of aggregation will be in our top priorities for the rest of 2015.

pykello avatar Dec 23 '14 14:12 pykello

This would be amazing, and would be the main feature that would allow using postgres as an industrial historian (GE, Osisoft, Siemens have products that cost big $$$$$$$s). If you can "select date_trunc('month',date) as m_date, avg(sensor5_value) from my_sensor_table group by m_date;" and know internally the query is just going to be calculating the avg from the block index aggregations(Skip list) and not stupidly summing and counting all the actual rows, and therefore be super fast, you can do away with a whole stack of products and temp tables you see littered across industrial databases.

philliproso avatar Jan 18 '15 07:01 philliproso

Thanks @philliproso for the insight. This feature seems to have a high demand, which moves it higher in our priorities.

I think I'll do some research about what is the best way to do this in the next month, then we can plan this.

pykello avatar Jan 19 '15 09:01 pykello

Hello @pykello ,.. If there has been a further development on this can you leave a link here where I can learn more? We are definitely interested in single column aggregation from cstore_fdw to win the performance boost in our lighting-strike climate data. Thanks. - shawn

koppenhoefer avatar May 27 '16 17:05 koppenhoefer

+1

jspeis avatar Jun 23 '16 11:06 jspeis