RSocrata
RSocrata copied to clipboard
Issue with Socrata's $query syntax
Some valid Socrata urls do not work with RSocrata, specifically:
'https://data.cityofchicago.org/resource/ktn3-fjzw.csv?$query=select log_no',
Steps to replicate:
read.socrata('https://data.cityofchicago.org/resource/ktn3-fjzw.csv?$query=select log_no', app_token = '${YOUR_APP_TOKEN}')
I've had some success creating my own personal functions for interacting with the Socrata API using the httr and dplyr packages...Super helpful and quick way to access the data. Here's an example:
Contains this function to do multiple gsubs at once:
mgsub <- function (pattern, replacement, x, ...) {
if (length(pattern) != length(replacement)) {
stop("pattern and replacement do not have the same length.")
}
result <- x
for (i in 1:length(pattern)) {
result <- gsub(pattern[i], replacement[i], result, ...)
}
result
}
And this is an example of a function to access data from a specific dataset. Not a generalizable function but I could see if you knew the 4x4 how it could be. Uses httr
and dplyr
mainly:
getSLBasicInfo471 <- function (limit = 999999999, select = "", where = "", order = "",
group = "", having = "", offset = "", q = "", query = "",
bom = "", output = "df", nameType = "") {
webstring <- paste0("https://opendata.usac.org/resource/7nws-3i6r.csv?")
if (limit != 999999999) {
webstring <- paste0(webstring, "&$limit=", limit)
}
else {
webstring <- paste0(webstring, "&$limit=", 999999999)
}
if (any(select != "")) {
webstring <- paste0(webstring, "&$select=", select)
}
if (where != "") {
webstring <- paste0(webstring, "&$where=", where)
}
if (order != "") {
webstring <- paste0(webstring, "&$order=", order)
}
if (group != "") {
webstring <- paste0(webstring, "&$group=", group)
}
if (having != "") {
webstring <- paste0(webstring, "&$having=", having)
}
if (offset != "") {
webstring <- paste0(webstring, "&$offset=", offset)
}
if (q != "") {
webstring <- paste0(webstring, "&$q=", q)
}
if (query != "") {
webstring <- paste0("https://opendata.usac.org/resource/7nws-3i6r.csv?$query=",
query)
}
if (bom != "") {
webstring <- paste0(webstring, "&$$bom=", bom)
}
webstring <- mgsub(c(" ", "'"), c("%20", "%27"), webstring)
if (output == "df") {
usac <- suppressMessages(webstring %>% GET() %>% content %>%
as.data.frame)
}
return(usac)
if (output == "url") {
return(webstring)
}
}
Is this helpful?
A clue? I am getting a version of this error, when using an otherwise valid $query in RSocrata. It seems to add an extraneous $order=:id" clause at the end of the statement. I wonder if this causing the errors with $query
Steps to duplicate:
Here is query that runs in my browser window but fails in RSocrata: https://data.cityofnewyork.us/resource/gszd-efwt.csv?$query=select * where violation_location_house='635' and violation_location_street_name='WEST 42 STREET' order by violation_date DESC
q <- "https://data.cityofnewyork.us/resource/gszd-efwt.csv?$query=select * where violation_location_house='635' and violation_location_street_name='WEST 42 STREET' order by violation_date DESC" df <- read.socrata(q )
Here is Socrata error message
2018-12-18 14:31:26.962 getResponse: Error in httr GET: 400 https://data.cityofnewyork.us/resource/gszd-efwt.csv?%24query=select%20%2A%20where%20violation_location_house%3D%27635%27%20and%20violation_location_street_name%3D%27WEST%2042%20STREET%27%20order%20by%20violation_date%20DESC&$order=:id <--------------- Error in getResponse(validUrl, email, password) : Bad Request (HTTP 400).
When I put the error message url into the browser I get the following error message | "If $query is used, all options - [$order] should not be specified in $query." When I delete the &$order clause in browser, query runs fine.
--
According to the SoQL documentation:
The $query parameter allows you to combine multiple SoQL clauses together into a single parameter, for convenience. Similar to SQL, clauses must be specified in a specific order.
It also identifies the SoQL $query
as: "A full SoQL query string, all as one parameter
."
From my understanding of the query
option, its basically a combination of all the other pieces of the SODA so you would have to put the order
statement into your query. So instead of saying $query = 'SELECT * WHERE city = 'New York''
for your query and then having $order=state
you would have $query=SELECT * WHERE city = 'New York' ORDER BY state'
as one complete query. Hope that make sense!
Don't know why your RSocrata query isn't working though...I really just recommend making a flavor of the custom function I have above...
Thanks. That is exactly what I did. I wrote a conventional sql statement. But I got an error in RSocrata. And in the error message it sent back a url:encoded version of my query (including an order by statement), but with another phrase appended that I did not write ... order by blah.... &$order=:id.
So it seemed that RSocrata was appending that, which is probably something it does to enable paging of simpler $select and $where clauses. But it may be triggering an error in read.socrata statement. But I am a new R user, so not ready to wade into the source code.
When I took out the order by part of mysql. there was still an error, with the &$order=:id.
I think the SoQL error message that appeared when I ran the query in the browse indicated that you can't have a $order statement appended to a $query statement, but I am not certain of that.
On Tue, Dec 18, 2018 at 2:57 PM Ben Robinson [email protected] wrote:
According to the SoQL documentation https://dev.socrata.com/docs/queries/query.html: The $query parameter allows you to combine multiple SoQL clauses together into a single parameter, for convenience. Similar to SQL, clauses must be specified in a specific order. It also identifies the SoQL $query as: "A full SoQL query string, all as one parameter."
From my understanding of the query option, its basically a combination of all the other pieces of the SODA so you would have to put the order statement into your query. So instead of saying $query = 'SELECT * WHERE city = 'New York'' for your query and then having $order=state you would have $query=SELECT * WHERE city = 'New York' ORDER BY state' as one complete query. Hope that make sense!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Chicago/RSocrata/issues/154#issuecomment-448349632, or mute the thread https://github.com/notifications/unsubscribe-auth/AZQjVziADxaamEkT6fHfw0rzXtx0HOHZks5u6UimgaJpZM4Tdntu .
-- Josh Barbanel [email protected]
The problem seems to be in the Rsocrata read.socrata function. It checks for an "$order" parameter and if it doesn't exist and there is no count query, it adds 'order=:id'. This causes the "$query" queries to fail. There may need to be an if clause or an else if clause that finds a "$query statement and leaves that url untouched.
This is 9 lines into read.socrata. if(!orderTest & !countTest) # sort by Socrata unique identifier validUrl <- paste(validUrl, if(is.null(parsedUrl$query)) {'?'} else {"&"}, '$order=:id', sep='')
Hi @sunshine52, thank you for the details!
The dev
branch has a PR that should address this issue. Are you able to check if that version of the package resolves the issue? It will be included with our next CRAN release.
Thanks I had no idea. I got it to work with the nightly version.
On Fri, Jan 25, 2019 at 5:10 PM Nick Lucius [email protected] wrote:
Hi @sunshine52 https://github.com/sunshine52, thank you for the details!
The dev branch has a PR that should address this issue. Are you able to check if that version of the package resolves the issue? It will be included with our next CRAN release.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Chicago/RSocrata/issues/154#issuecomment-457748671, or mute the thread https://github.com/notifications/unsubscribe-auth/AZQjV03iF_DRXrY9GHMdNIJf6wEPxWzsks5vG4DOgaJpZM4Tdntu .
-- Josh Barbanel [email protected]
Good to hear, @sunshine52. When we move the code to the master branch, this issue will close out.
This still seems to be an issue, I've tried the 'nightly' dev version and still get the bad request error in R:
read.socrata("https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'")
However this works in my web browser:
https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'
The read.socrata
function is still pasting &$order=:id
to the end of the query string.
I was able to run your query in Rsocrata. sessionInfo shows RSocrata_1.8.0-10 sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 30 (Thirty) Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] reshape2_1.4.3 RMySQL_0.10.18 DBI_1.1.0 clipr_0.7.0 RSocrata_1.8.0-10 lubridate_1.7.4 [7] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3 readr_1.3.1 tidyr_1.0.0 [13] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1
read.socrata("https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'") tagcode 1 <NA> 2 <NA> 3 637092 4 <NA> 5 <NA> 6 055928 7 637027 8 637074 9 220262 10 055927...
On Fri, Dec 20, 2019 at 8:32 PM Ben-Cox [email protected] wrote:
This still seems to be an issue, I've tried the 'nightly' dev version and still get the bad request error in R: read.socrata("https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'")
However this works in my web browser:
https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'
The read.socrata function is still pasting &$order=:id to the end of the query string.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Chicago/RSocrata/issues/154?email_source=notifications&email_token=AGKCGV44K3YWZHTP3XRB24DQZVW4PA5CNFSM4E3WPNXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHOSEHY#issuecomment-568140319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGKCGV3AUALHUNIOSNW63UDQZVW4PANCNFSM4E3WPNXA .
-- Josh Barbanel [email protected]
Well its working on my machine now too, and I didn't change anything. I'll chock it up to a Christmas miracle. Thanks!