bigQueryR
bigQueryR copied to clipboard
Provided Schema does not match Table
Hey
I am trying to download app data from bigquery and getting the above error. Firebase is connected to bq and I have "events_" table there.
I am trying to download the data using this function bqr_partition("events_2021", "ga_partition", my_project, my_dataset)
But I am getting this error and I dont understand how to solve it
Job had error:
Provided Schema does not match Table xxxx-native-app-33f08:analytics_153017681.ga_partition$20210601. Cannot add fields (field: event_params) [1] "Provided Schema does not match Table xxxx-native-app-33f08:analytics_153017681.ga_partition$20210601. Cannot add fields (field: event_params)"
Do I need to specify the schema for this function? I tried narrowing the function even more bqr_partition("events_202106", "ga_partition", my_project, my_dataset) Still getting the same error
Thanks
bqr_partition doesn't download data, it creates a partitioned table. Do you need bqr_query() ?
Hey Thanks
I have my app connected to bigquery and I want to download all events data to R and manipulate the data there. I know that the data is saved in a daily events table and I want to get the data for all of them so I though I need the bqr_partition to combine them to one :-)
I need to use bqr_query() to download all events_YYYYMMDD table?
something like this? bqr_query(my_project,my_dataset, "SELECT * FROM [native-app-33f08:analytics_153017681.events_]" ) ?
or do I need to change the events_ on a daily basis? bqr_query(my_project,my_dataset, "SELECT * FROM [native-app-33f08:analytics_153017681.events_20210606]" ) bqr_query(my_project,my_dataset, "SELECT * FROM [native-app-33f08:analytics_153017681.events_20210607]" )
etc...
Thank you so much !
Yes, I would try out the queries in the web console first then something like the examples above will work. If its a lot of data you may also want to do it asynchronously via bq_query_async().
You could also use googleAnalyticsR and use the Data API to download the events which is free.
Thank you. It worked in the console and in R, now I just need to make it go over all the events_YYYY
Not sure what you meant by the Data API, this account is not an app + web (GA4) Its firebase account linked to BQ. How can I get the events data using this API?
Tried searching for it using those functions and its not there
ga4 <- ga_account_list("ga4") data_accounts <- ga_account_list("data")
Ah ok, fair enough then. I suggest using standardSQL (e.g. useLegacySql=FALSE in the function) and then you can use wildcards for querying many tables at once - FROM dataset.table_*
Thanks for the great help and feedback
job <- bqr_query_asynch(projectId = my_project,
datasetId = my_dataset,
"SELECT * FROM `native-app-33f08.analytics_153017681.events_*`",
destinationTableId = "bigResultTable",
useLegacySql = FALSE
# ,writeDisposition = c("WRITE_EMPTY","WRITE_TRUNCATE", "WRITE_APPEND")
)
This Works fine, getting the data in BQ in a new table. But trying to extract the data to google cloud is the next step and then get it from there to R back?
job_extract <- bqr_extract_data(projectId = my_project,
datasetId = my_dataset,
tableId = "bigResultTable",
cloudStorageBucket = "bigquery_r",
filename = paste0("big-query-extract-", gsub(" |:|-", "", Sys.time()),"-*.csv")
)
jobtoextract <- bqr_get_job(projectId = my_project, jobId = job_extract$jobReference$jobId)
bqr_wait_for_job(jobtoextract, wait = 5)
The extraction doesnt work for some reason
Job had error:
Operation cannot be performed on a nested schema. Field: event_params [1] "Operation cannot be performed on a nested schema. Field: event_params" Warning message: In bqr_wait_for_job(jobtest, wait = 5) : Operation cannot be performed on a nested schema. Field: event_params
The extract is expecting a flat table from your query, I guess your SQL is writing back nested formatted JSON
The firebase events__ table will by default be nested no?
because of the events_param and user_properties

Yes, to make flat tables you typically use UNNEST() to extract out the rows.
oh ok, not familiar with SQL that much, will try and look online for some resources about how to do it
Thank you so much for all your help.
job <- bqr_query_asynch(projectId = my_project,
datasetId = my_dataset,
"SELECT event_name, param, event_date FROM `native-app-33f08.analytics_153017681.events_*`, UNNEST(event_params) as param",
destinationTableId = "bigResultTable6",
useLegacySql = FALSE
,writeDisposition = c("WRITE_TRUNCATE")
)
job_extract <- bqr_extract_data(projectId = my_project,
datasetId = my_dataset,
tableId = "bigResultTable6",
cloudStorageBucket = "gs://bigquery_r",
filename = paste0("big-query-extract-", gsub(" |:|-", "", Sys.time()),"-*.csv")
)
still give me Operation cannot be performed on a nested schema. Field: param
Kind of lost here if you guide me I would highly appreciate it