extensions icon indicating copy to clipboard operation
extensions copied to clipboard

[firestore-bigquery-export] switch to materialized views

Open meyerovb opened this issue 4 years ago • 10 comments
trafficstars

Since this isnt using materialized views I need to scheduled copy its views into physical tables in order to use bigquery bi engine with it in google data studio. if it was using mat views data studio could use accelerated access

meyerovb avatar Sep 11 '21 01:09 meyerovb

Thanks @meyerovb

I have updated this for discussion.

dackers86 avatar Sep 14 '21 15:09 dackers86

@meyerovb thanks for the suggestion. is there anything stopping you from generating the materialized views yourself based on the gen schema scripts?

i14h avatar Dec 21 '21 18:12 i14h

@i14h I have tried to create a Materialized View from one of the current views. I had to reference the raw table (Materialized Views only work with native tables), rather than the <table>_raw_latest view and received the error Materialized views do not support analytic functions or WITH OFFSET.

jasonberryman avatar Dec 22 '21 16:12 jasonberryman

HI @jasonberryman

Did you manage resolve a Materialised view for your project?

Based on the current data, this appears tricky as Materialised Views appear to be more limited than standard views.

I'll move this issue back to accepted to investigate whether a script os possible for this.

dackers86 avatar Jan 11 '23 11:01 dackers86

Hey guys, do you have any ideas how to create materialized view for <collection>_raw_latest views? latest views are using PARTITION BY and ORDER BY to fetch latest timestamp, flag if document is deleted etc. But, ORDER operation is not allowed in the BQ materialized views.

Any tips and tricks how to overcome this issue? I am thinking about writing Firebase -> BQ connector which will use DML to update existing rows. What do you think?

cimox avatar Feb 15 '23 09:02 cimox

+1 vote for matterialized views

jshandorov1 avatar Dec 13 '23 07:12 jshandorov1

Hey guys, do you have any ideas how to create materialized view for <collection>_raw_latest views? latest views are using PARTITION BY and ORDER BY to fetch latest timestamp, flag if document is deleted etc. But, ORDER operation is not allowed in the BQ materialized views.

Any tips and tricks how to overcome this issue? I am thinking about writing Firebase -> BQ connector which will use DML to update existing rows. What do you think?

@cimox I have the same question. Have you found a way to achieve this?

davestimpert avatar Mar 27 '24 13:03 davestimpert

@davestimpert I did not find a solution to this issue at the time. However, I created a custom set of materialized views to improve performance, which worked nicely for my use case. Unfortunately, I did not need to work with these Firebase export datasets since then, so no more progress on my end.

cimox avatar Mar 27 '24 15:03 cimox

@davestimpert I am not maintaining that system anymore so writing it based on my memory from few years back. I think we used the raw_changlog table to create a materialized view that flattened all the columns. We couldn't do any filtering in this view, which is sad. Then we created a latest view based on it and used same logic as what Google gives in raw_latest. I had attended some Google sessions at that time and I had heard about some enhancements being planned by Google for materialized view that will make it easier for creating the latest_view from raw_changelog and have it materialized in one step. If you are a paid customer, it might not hurt to create a ticket for the Google BQ team to confirm the possibilities.

RajvirBains avatar Mar 27 '24 16:03 RajvirBains

I dont know what I'm doing but maybe this query would do the trick ? It seems compatible with materialized views, and how I understand it, MAX_BY will run for each GROUP

SELECT
  document_id,
  document_name,
  MAX(timestamp) AS latest_timestamp,
  MAX_BY(operation, timestamp) AS latest_operation,
  MAX_BY(data, timestamp) AS latest_data
FROM
  `xxx.xxx.raw_changelog`
GROUP BY
  document_id, document_name

It wont exclude the deleted documents, but materialized views seems to support WITH statements

Aetherall avatar Oct 21 '24 10:10 Aetherall

https://github.com/firebase/extensions/pull/2258 - change tracker

https://github.com/firebase/extensions/pull/2262 - extension

cabljac avatar Jan 31 '25 10:01 cabljac