bigrquery icon indicating copy to clipboard operation
bigrquery copied to clipboard

dbplyr functions for Google Big Query

Open jkylearmstrong opened this issue 4 years ago • 1 comments

request is for additional functions / wrappers for interface with Google Big Query (GBQ)- for instance:

DATE_SUB: mutate(lookback_start = sql("DATE_SUB(TRAINING_END_DATE, INTERVAL 30 DAY)")

pivot the current GBQ standard is to supply listing of entries for pivot columns -whereas tidyr::piviot_wider will assume that everything within the pivot_column will be pivioted - as often the case is. it would be easier to first implement a filter then apply piviot for all columns rather than specificity a listing of entries which are to be transposed see: (https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#pivot_operator)

unpiviot

GBQ_ML - are there functions to call GBQ ML library - for instance see: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-glm could this be implemented with dbSendQuery ?

compute - often it is the case that temp tables cannot be added to project and or dataset which is specified within connection string and there are alternative projects/schemas for which tables can be written. However, the default appears to be to write tables within the existing connection to project-dataset; so perhaps parameters could be added so user can specify these requirements.

jkylearmstrong avatar Nov 30 '21 07:11 jkylearmstrong

Here is an example for pivot

packages

library(bigrquery)
library(DBI)
library(dplyr)

project settings

my_project <- "a"
my_dataset <- "b"
my_billing <- 'c'

user settings

my_user <- "d"
my_sup_project <- "e"
my_table <- "f"

connection

con <- dbConnect(
  bigrquery::bigquery(),
  project = my_project,
  dataset = my_dataset,
  billing = my_billing
)

sample data

data_local <- tribble(
  ~product, ~sales,  ~quarter,
  "Kale", 51, "Q1",  
  "Kale", 23,  "Q2",
  "Kale", 45,  "Q3",
  "Kale", 3, "Q4",
  "Apple", 77, "Q1",
  "Apple", 0 , "Q2",
  "Apple", 25, "Q3",
  "Apple", 2 , "Q4"  
)

data_local

upload data to GBQ

table_full_name <- paste0(my_sup_project, ".",my_user,".", my_table)
bq_table_upload(table_full_name,data_local)

connect to GBQ table

data_gcp <- tbl(con, table_full_name)
data_gcp

pivot function prototype

pivot.sum <- function(dplyr_sql,
                 input_column,
                 pivot_column){
     
    output_q <- capture.output(show_query(dplyr_sql), type="output")
    q <- paste(tail(output_q,-1), collapse=" ")
    VIEW_SQL <- q

    
    t_cols_list_1 <- dplyr_sql %>%
    select(all_of(pivot_column)) %>%
    distinct() %>%
    collect() 

    t_list_2 <- t_cols_list_1 %>%
        mutate(transpose_column = !!sym(pivot_column))
     
    t_cols_list_5 <- paste0(paste0('"', t_list_2$transpose_column , '"'), collapse = " , ")
   
    sql6 <- paste0("SELECT * FROM 
                    (", VIEW_SQL, ")
                    PIVOT(SUM(",input_column,") FOR ", pivot_column," IN (", t_cols_list_5, "))  ")
    
    return(tbl(con, sql(sql6)))
} 

test function

pivot_data_gcp <- data_gcp %>%
    pivot.sum(input_column = "sales",
          pivot_column = "quarter")

pivot_data_gcp 

jkylearmstrong avatar Nov 30 '21 13:11 jkylearmstrong

Unfortunately pivotting is currently out of scope for bigrquery due to the development backlog I need to get to first.

hadley avatar Nov 03 '23 12:11 hadley