How to deal with `pq__varchar` columns
I have some queries that return columns of type pq__varchar. I was wondering if there was some simple way to convert these data into a more R friendly format inside R?
Thanks, David
Thanks. Could you please provide a reproducible example? Please see https://reprex.tidyverse.org/ for guidance.
Given any RPostgres connection to a postgresql database, con, try the following:
x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
class(x$v)
Ideally I would be able to treat the elements of x$v like vectors/arrays in R, extracting elements as needed, etc., but it isn't clear how to do this with a pq__varchar. I was wondering if there was some sort of straight forward function or method for transforming these into something more familiar to R.
The less than ideal solution that I eventually used for this situation was to parse the string representation of the array in R. But this only works easily because I have 1-D arrays with strings that do not have quote characters or commas in them. A better solution working directly from the pq__varchar variable would be preferable. Here is an example of a solution:
x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
x$v_list = lapply(x$v,\(.) eval(parse(text=sub("[{]","c('",sub("[}]","')",gsub(",","','",.))))))
Reprex:
con <- DBI::dbConnect(RPostgres::Postgres())
x <- DBI::dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
x
#> v
#> 1 {a,b}
#> 2 {abc,def,fgh}
class(x$v)
#> [1] "pq__varchar"
Created on 2024-04-01 with reprex v2.1.0
@paleolimbot: Is there support for nested data in adbcpostgresql?
It seems to!
library(adbcdrivermanager)
con <- adbcpostgresql::adbcpostgresql() |>
adbc_database_init(uri = "postgresql://localhost:5432/postgres?user=postgres&password=password") |>
adbc_connection_init()
con |>
read_adbc(
"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a"
) |>
as.data.frame() |>
dplyr::pull()
#> <list_of<character>[2]>
#> [[1]]
#> [1] "a" "b"
#>
#> [[2]]
#> [1] "abc" "def" "fgh"
Created on 2024-04-01 with reprex v2.1.0
I say this lightly, but I actually spent quite a bit of time getting this to work. It also works for postgres' "record" type, where it should give you back a nested data frame. There is a slight limitation on reading the array type...I believe multidimensional arrays will just be returned as a flat list_of and will (currently) loose the dimensions.
Thanks, this is wonderful! I suspect it will also work via adbi, https://adbi.r-dbi.org/. @dmkaplan2000: can you confirm?
This brings up memories of my work bringing nested data to the duckdb R client. I do appreciate the effort!
I tried getting this to work with adbi, but I must admit I can't figure out how to successfully connect to a database. First I tried opening a connection with:
con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432/DBNAME?user=USER&password=PASSWORD")
That worked without error, but when I tried to use the connection to execute a query I got:
> dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
Error in adbcdrivermanager::adbc_statement_get_parameter_schema(stmt) :
NOT_IMPLEMENTED
Based on what I could find online, I tried reducing the URI to uri="postgresql://localhost:5432". That also worked for starting the connection, but not for executing a query.
Do you know what I am doing wrong? The help on adbi and adbcpostgresql is pretty spartan...
It's a moonshot, but can you try dbGetQuery(immediate = TRUE) ? This is DBI's current way of telling "we don't want a prepared query, just execute". I wonder if adbi takes action based on that flag, and if not, whether it should.
CC @nbenn.
immediate=TRUE fixed the issue!
> con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432")
> x=dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a",immediate=TRUE)
> x$v
<list_of<character>[2]>
[[1]]
[1] "a" "b"
[[2]]
[1] "abc" "def" "fgh"
So this query works with adbi, but I am not sure this really solves my issue in the sense that it fixes one thing by breaking another. My real queries involve, in addition to character arrays, spatial data provided by the postgis extension to postgresql. RPostgres plays nicely with the sf package so the spatial data can be processed, but I do not think that adbi can be used with sf. At least my initial examination does not indicate that this is the case.
Hi, I recently just had to find a generic solution for converting 1-D Postgresql arrays in a pq_varchar column into a list of character vectors. The solution that was found is detailed on stackoverflow here, but the essential code is:
# Generate data
con <- DBI::dbConnect(RPostgres::Postgres())
q = "WITH a(v) AS (VALUES(ARRAY['''a''','b']::varchar[]),(ARRAY['abc, def','the \"big\" one','fgh']::varchar[])) SELECT * FROM a"
x <- DBI::dbGetQuery(con,q)
# Function that does the work on individual varchar arrays imported into R
pq_varchar_array_to_char = function(x) {
x |>
gsub("^[{]|[}]$", "", x = _) |>
gsub("[\\][\"]","%myspecialquote%",x = _) |>
scan(text = _, quote = '"', what = "", sep = ",", quiet = TRUE) |>
gsub("%myspecialquote%","\"",x = _)
}
# Apply function to each element of column of x
## Non-tidyverse version
out <- transform(x, v = lapply(v, pq_varchar_array_to_char))
## Tidyverse version
library(tidyverse)
out2 <- mutate(x, v = lapply(v, pq_varchar_array_to_char))
Perhaps some version of pq_varchar_array_to_char could be added to RPostgres either to be applied directly to these column types or as a helper function for users? (note: my function will fail if any of the values happens to have the text %myspecialquote% in it, but this could be fixed by recursively testing, for example, |%|, |%%|, |%%%| ... to see if one of them is not in the array and then use that as the replacement for slash-quote).
This function could also be extended to work with various types of Postgresql arrays (for example, with a generic convert_pq_array_to_r_vector function with methods for each column type that would apply the base function and then cast vector elements to the desired final value type).
This function could probably be extended to the case of N-D arrays, but that is not my use case for the time being...
Thanks. Does the VALUES() function return JSON in this example by any chance? Can you share an example for input that pq_varchar_array_to_char() would see?
No, I don't think VALUES() returns JSON in this example. What pq_varchar_array_to_char() would see is as follows:
> con <- DBI::dbConnect(RPostgres::Postgres())
> q = "WITH a(v) AS (VALUES(ARRAY['''a''','b']::varchar[]),(ARRAY['abc, def','the \"big\" one','fgh']::varchar[])) SELECT * FROM a"
> x <- DBI::dbGetQuery(con,q)
> x
v
1 {'a',b}
2 {"abc, def","the \\"big\\" one",fgh}
> x$v[2]
[1] "{\"abc, def\",\"the \\\"big\\\" one\",fgh}"
> class(x)
[1] "data.frame"
> class(x$v)
[1] "pq__varchar"
> class(x$v[2])
[1] "character"
Is that what you wanted?
Yes, thanks. Maybe we could leverage the PostgreSQL json_agg() function to return proper JSON and parse it in R with jsonlite?
https://stackoverflow.com/a/24006432/946850