dbplyr functions for Google Big Query
request is for additional functions / wrappers for interface with Google Big Query (GBQ)- for instance:
DATE_SUB: mutate(lookback_start = sql("DATE_SUB(TRAINING_END_DATE, INTERVAL 30 DAY)")
pivot the current GBQ standard is to supply listing of entries for pivot columns -whereas tidyr::piviot_wider will assume that everything within the pivot_column will be pivioted - as often the case is. it would be easier to first implement a filter then apply piviot for all columns rather than specificity a listing of entries which are to be transposed see: (https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#pivot_operator)
unpiviot
GBQ_ML - are there functions to call GBQ ML library - for instance see: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-glm could this be implemented with dbSendQuery ?
compute - often it is the case that temp tables cannot be added to project and or dataset which is specified within connection string and there are alternative projects/schemas for which tables can be written. However, the default appears to be to write tables within the existing connection to project-dataset; so perhaps parameters could be added so user can specify these requirements.
Here is an example for pivot
packages
library(bigrquery)
library(DBI)
library(dplyr)
project settings
my_project <- "a"
my_dataset <- "b"
my_billing <- 'c'
user settings
my_user <- "d"
my_sup_project <- "e"
my_table <- "f"
connection
con <- dbConnect(
bigrquery::bigquery(),
project = my_project,
dataset = my_dataset,
billing = my_billing
)
sample data
data_local <- tribble(
~product, ~sales, ~quarter,
"Kale", 51, "Q1",
"Kale", 23, "Q2",
"Kale", 45, "Q3",
"Kale", 3, "Q4",
"Apple", 77, "Q1",
"Apple", 0 , "Q2",
"Apple", 25, "Q3",
"Apple", 2 , "Q4"
)
data_local
upload data to GBQ
table_full_name <- paste0(my_sup_project, ".",my_user,".", my_table)
bq_table_upload(table_full_name,data_local)
connect to GBQ table
data_gcp <- tbl(con, table_full_name)
data_gcp
pivot function prototype
pivot.sum <- function(dplyr_sql,
input_column,
pivot_column){
output_q <- capture.output(show_query(dplyr_sql), type="output")
q <- paste(tail(output_q,-1), collapse=" ")
VIEW_SQL <- q
t_cols_list_1 <- dplyr_sql %>%
select(all_of(pivot_column)) %>%
distinct() %>%
collect()
t_list_2 <- t_cols_list_1 %>%
mutate(transpose_column = !!sym(pivot_column))
t_cols_list_5 <- paste0(paste0('"', t_list_2$transpose_column , '"'), collapse = " , ")
sql6 <- paste0("SELECT * FROM
(", VIEW_SQL, ")
PIVOT(SUM(",input_column,") FOR ", pivot_column," IN (", t_cols_list_5, ")) ")
return(tbl(con, sql(sql6)))
}
test function
pivot_data_gcp <- data_gcp %>%
pivot.sum(input_column = "sales",
pivot_column = "quarter")
pivot_data_gcp
Unfortunately pivotting is currently out of scope for bigrquery due to the development backlog I need to get to first.