duckplyr icon indicating copy to clipboard operation
duckplyr copied to clipboard

Need to be stricter about column compatibility in joins

Open krlmlr opened this issue 9 months ago • 3 comments

Also, use simple identity instead of r_base::== for joins.

dplyr::union_all(data.frame(a = 1), data.frame(a = "2"))
#> Error in `dplyr::union_all()`:
#> ! `x` and `y` are not compatible.
#> ✖ Incompatible types for column `a`: double vs character.
dplyr::setdiff(data.frame(a = 1), data.frame(a = "2"))
#> Error in `dplyr::setdiff()`:
#> ! `x` and `y` are not compatible.
#> ✖ Incompatible types for column `a`: double vs character.
dplyr::left_join(data.frame(a = 1), data.frame(a = "2"), by = "a")
#> Error in `dplyr::left_join()`:
#> ! Can't join `x$a` with `y$a` due to incompatible types.
#> ℹ `x$a` is a <double>.
#> ℹ `y$a` is a <character>.
dplyr::semi_join(data.frame(a = 1), data.frame(a = "2"), by = "a")
#> Error in `dplyr::semi_join()`:
#> ! Can't join `x$a` with `y$a` due to incompatible types.
#> ℹ `x$a` is a <double>.
#> ℹ `y$a` is a <character>.

duckplyr:::duckplyr_union_all(data.frame(a = 1), data.frame(a = "2"))
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Union
#>   r_dataframe_scan(0x11d52b258)  r_dataframe_scan(0x11d0c4e38)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (VARCHAR)
#> 
#>     a
#> 1 1.0
#> 2   2
duckplyr:::duckplyr_setdiff(data.frame(a = 1), data.frame(a = "2"))
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Except
#>   r_dataframe_scan(0x11d556c70)  r_dataframe_scan(0x11d55aea0)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (VARCHAR)
#> 
#>     a
#> 1 1.0
duckplyr:::duckplyr_left_join(data.frame(a = 1), data.frame(a = "2"), by = "a")
#> Error in `left_join()` at duckplyr/R/left_join.R:19:3:
#> ! Can't join `x$a` with `y$a` due to incompatible types.
#> ℹ `x$a` is a <double>.
#> ℹ `y$a` is a <character>.
duckplyr:::duckplyr_semi_join(data.frame(a = 1), data.frame(a = "2"), by = "a")
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Join REGULAR SEMI ___eq_na_matches_na(lhs.a, rhs.a)
#>   r_dataframe_scan(0x14aace6d0)
#>   r_dataframe_scan(0x13b892678)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> 
#> [1] a
#> <0 rows> (or 0-length row.names)

Created on 2024-05-16 with reprex v2.1.0

krlmlr avatar May 16 '24 03:05 krlmlr