duckdb-r
duckdb-r copied to clipboard
`round()` semantics of R-version differs from its duckdb-translation
In the current state of duckdb-r, the round()
semantics of R-version differs from its duckdb-translation in case where the rounded digit is "5".
See the difference in the following reprex:
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#>
#> filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#>
#> intersect, setdiff, setequal, union
library(duckdb)
#> Lade nötiges Paket: DBI
library(DBI)
df <- tibble(x = c(0.5, 1.5))
df |>
mutate(
r = round(x, 0L)
)
#> # A tibble: 2 × 2
#> x r
#> <dbl> <dbl>
#> 1 0.5 0
#> 2 1.5 2
con <- DBI::dbConnect(duckdb())
duckdb::duckdb_register(con, "df", df)
tbl(con, "df") |>
mutate(
r = round(x, 0L),
r2 = round_even(x, 0L)
)
#> # Source: SQL [2 x 3]
#> # Database: DuckDB v0.10.0 [xxx@Windows 10 x64:R 4.3.2/:memory:]
#> x r r2
#> <dbl> <dbl> <dbl>
#> 1 0.5 1 0
#> 2 1.5 2 2
DBI::dbDisconnect(con)
Created on 2024-04-29 with reprex v2.1.0
From the round()
documentation in R:
Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on OS services and on representation error (since e.g. 0.15 is not represented exactly, the rounding rule applies to the represented number and not to the printed number, and so round(0.15, 1) could be either 0.1 or 0.2).
The standard duckdb implementation does not adhere to this standard (see e.g. the discussion in duckdb issue). But an alternative implementation does this: round_even()
.
Should the SQL-translation translate R "round()" to "round_even()" in order to get consistent results?
Thanks, good catch. Yes, translating to round_even()
sounds like the right thing to do. Would you like to contribute?