RPostgres icon indicating copy to clipboard operation
RPostgres copied to clipboard

Encoding issue with identifiers passed via dbWriteTable or copy_to?

Open majazaloznik opened this issue 2 years ago • 3 comments

I have both server and client ecnoding set to UTF-8, yet writing a table that has e.g. windows-1252 characters in the column names will cause an error or, worse case, even crash R.

The example uses dplyr::copy_to(), but the exact same thing happens with DBI::dbWriteTable()

library(RPostgreSQL)

# Replace connection parameters here
con <- dbConnect(RPostgres::Postgres(),
                 # client_encoding = "utf8", # makes no difference
                 dbname = "sandbox",
                 host = "localhost",
                 port = 5432,
                 user = "postgres",
                 password = Sys.getenv("PG_local_PG_PSW"))
# check encoding
dbGetQuery(con, "SHOW server_encoding");
# server_encoding
# 1            UTF8
dbGetQuery(con, "SHOW client_encoding");
# client_encoding
# 1            UTF8

# create df with UTF8 in values
df1 <- data.frame(a = c("Č", "Š", "Ž"))
# this works fine
dplyr::copy_to(con, df1, "df1",temporary = TRUE)

# create df with UTF8 in names
df2 <- data.frame(ČxŠxŽ = c("Č", "Š", "Ž"))
# this throws an error
dplyr::copy_to(con, df2, "df2",temporary = TRUE)
# Error: Failed to prepare query: ERROR:  invalid byte sequence for encoding "UTF8": 0xc8 0x78

# create df with single UTF8 character in name
df3 <- data.frame(Č = c("Č", "Š", "Ž"))
# this last one will abort the R session:
# copy_to(con, df3, "df3",temporary = TRUE)

majazaloznik avatar Oct 19 '22 10:10 majazaloznik