dbplyr icon indicating copy to clipboard operation
dbplyr copied to clipboard

Join operation with `na_matches="na"` returns invalid query for Redshift

Open krystian8207 opened this issue 2 months ago • 0 comments

Defining joins for Redshift (e.g. using left_join) with na_matches="na" argument returns query that includes IS NOT DISTINCT FROM statement which is not supported by the driver.

db <- DBI::dbConnect(
  RPostgres::Redshift(),
  dbname   = Sys.getenv("REDSHIFT_DBNAME"),
  host     = Sys.getenv("REDSHIFT_HOST"),
  port     = Sys.getenv("REDSHIFT_PORT"),
  user     = Sys.getenv("REDSHIFT_USERNAME"),
  password = Sys.getenv("REDSHIFT_PASSWORD")
)
my_tbl <- dplyr::tbl(db, "mytbl")
join_query <- left_join(my_tbl, my_tbl, na_matches = "na")

join_query |> dplyr::show_query()
#> <SQL>
#>   SELECT "mytbl_LHS".*
#>   FROM "mytbl" AS "mytbl_LHS"
#> LEFT JOIN "mytbl" AS "mytbl_RHS"
#> ON (
#>   "mytbl_LHS"."id" IS NOT DISTINCT FROM "mytbl_RHS"."id" 
#> )

Trying to collect the data responds with the following error:

dplyr::collect(join_query)
#> Error in `dplyr::collect()`:
#> ! Failed to collect lazy table.
#> Caused by error:
#> ! Failed to prepare query: ERROR:  syntax error at or near "DISTINCT"
#> LINE 5:     "mytbl_LHS"."id" IS NOT DISTINCT FROM "mytbl...

krystian8207 avatar Jun 25 '24 11:06 krystian8207