dbplyr
dbplyr copied to clipboard
`copy_to` creates tables with invalid column names
I use dbplyr with a database where some column names are invalid.
If a table with invalid column name is copied to the data base, invalid
names are changed to valid ones. Thus when using copy_to
, the table in the
database might have column names than are different from the original data frame.
This was no problem in dbplyr 2.1.1, but in dbplyr 2.2.1 the returned tbl object contains the old (invalid) column names instead of the new correct ones. This leads to bugs when working with the table.
A closer look into the code showed that the following line led to the problem: https://github.com/tidyverse/dbplyr/blame/main/R/verb-copy-to.R#L86
Before the line was changes, the column names were read from the table. Now it is assumed that they are the same as the ones in the data frame. This assumption isn't true in my case and I think that there are also other data bases where this isn't necessarily true (random example: https://www.techtalkcorner.com/parquet-files-column-names-with-spaces/).
Can we either
- get back to the old behaviour in
copy_to
to read the column names from the table or - add a parameter to switch the behavior (use column names from data frame or read them from table)?
It feels quite unexpected to me that the database changes the column names. So, I'm not a big fan of the old behaviour. At least, this should be a warning. So it also does not make sense to me to add such a parameter. Instead you should rather rename the columns before copying. But the current behaviour is not that good either. Though, it feels like quite an edge case.
@hadley What's your opinion on that?
Hmmmmmm, given that some databases do this, and we can't really protect it, it seems like it would be safest to ask the database for the column names, and then maybe warn the user if they're changed from what they supplied?