RSocrata icon indicating copy to clipboard operation
RSocrata copied to clipboard

Issue with Socrata's $query syntax

Open danielOKeefe opened this issue 6 years ago • 11 comments

Some valid Socrata urls do not work with RSocrata, specifically:

'https://data.cityofchicago.org/resource/ktn3-fjzw.csv?$query=select log_no',

Steps to replicate:

read.socrata('https://data.cityofchicago.org/resource/ktn3-fjzw.csv?$query=select log_no', app_token = '${YOUR_APP_TOKEN}')

danielOKeefe avatar Apr 20 '18 15:04 danielOKeefe

I've had some success creating my own personal functions for interacting with the Socrata API using the httr and dplyr packages...Super helpful and quick way to access the data. Here's an example:

Contains this function to do multiple gsubs at once:

mgsub <- function (pattern, replacement, x, ...)  {
    if (length(pattern) != length(replacement)) {
        stop("pattern and replacement do not have the same length.")
    }
    result <- x
    for (i in 1:length(pattern)) {
        result <- gsub(pattern[i], replacement[i], result, ...)
    }
    result
}

And this is an example of a function to access data from a specific dataset. Not a generalizable function but I could see if you knew the 4x4 how it could be. Uses httr and dplyr mainly:

getSLBasicInfo471 <- function (limit = 999999999, select = "", where = "", order = "", 
    group = "", having = "", offset = "", q = "", query = "", 
    bom = "", output = "df", nameType = "")  {
    webstring <- paste0("https://opendata.usac.org/resource/7nws-3i6r.csv?")
    if (limit != 999999999) {
        webstring <- paste0(webstring, "&$limit=", limit)
    }
    else {
        webstring <- paste0(webstring, "&$limit=", 999999999)
    }
    if (any(select != "")) {
        webstring <- paste0(webstring, "&$select=", select)
    }
    if (where != "") {
        webstring <- paste0(webstring, "&$where=", where)
    }
    if (order != "") {
        webstring <- paste0(webstring, "&$order=", order)
    }
    if (group != "") {
        webstring <- paste0(webstring, "&$group=", group)
    }
    if (having != "") {
        webstring <- paste0(webstring, "&$having=", having)
    }
    if (offset != "") {
        webstring <- paste0(webstring, "&$offset=", offset)
    }
    if (q != "") {
        webstring <- paste0(webstring, "&$q=", q)
    }
    if (query != "") {
        webstring <- paste0("https://opendata.usac.org/resource/7nws-3i6r.csv?$query=", 
            query)
    }
    if (bom != "") {
        webstring <- paste0(webstring, "&$$bom=", bom)
    }
    webstring <- mgsub(c(" ", "'"), c("%20", "%27"), webstring)
    if (output == "df") {
        usac <- suppressMessages(webstring %>% GET() %>% content %>% 
            as.data.frame)
        }
    return(usac)
    if (output == "url") {
        return(webstring)
    }
}

Is this helpful?

benjaminrobinson avatar May 18 '18 14:05 benjaminrobinson

A clue? I am getting a version of this error, when using an otherwise valid $query in RSocrata. It seems to add an extraneous $order=:id" clause at the end of the statement. I wonder if this causing the errors with $query

Steps to duplicate:

Here is query that runs in my browser window but fails in RSocrata: https://data.cityofnewyork.us/resource/gszd-efwt.csv?$query=select * where violation_location_house='635' and violation_location_street_name='WEST 42 STREET' order by violation_date DESC

q <- "https://data.cityofnewyork.us/resource/gszd-efwt.csv?$query=select * where violation_location_house='635' and violation_location_street_name='WEST 42 STREET' order by violation_date DESC" df <- read.socrata(q )

Here is Socrata error message

2018-12-18 14:31:26.962 getResponse: Error in httr GET: 400 https://data.cityofnewyork.us/resource/gszd-efwt.csv?%24query=select%20%2A%20where%20violation_location_house%3D%27635%27%20and%20violation_location_street_name%3D%27WEST%2042%20STREET%27%20order%20by%20violation_date%20DESC&$order=:id <--------------- Error in getResponse(validUrl, email, password) : Bad Request (HTTP 400).

When I put the error message url into the browser I get the following error message | "If $query is used, all options - [$order] should not be specified in $query." When I delete the &$order clause in browser, query runs fine.

--  

sunshine52 avatar Dec 18 '18 19:12 sunshine52

According to the SoQL documentation: The $query parameter allows you to combine multiple SoQL clauses together into a single parameter, for convenience. Similar to SQL, clauses must be specified in a specific order. It also identifies the SoQL $query as: "A full SoQL query string, all as one parameter."

From my understanding of the query option, its basically a combination of all the other pieces of the SODA so you would have to put the order statement into your query. So instead of saying $query = 'SELECT * WHERE city = 'New York'' for your query and then having $order=state you would have $query=SELECT * WHERE city = 'New York' ORDER BY state' as one complete query. Hope that make sense!

Don't know why your RSocrata query isn't working though...I really just recommend making a flavor of the custom function I have above...

benjaminrobinson avatar Dec 18 '18 19:12 benjaminrobinson

Thanks. That is exactly what I did. I wrote a conventional sql statement. But I got an error in RSocrata. And in the error message it sent back a url:encoded version of my query (including an order by statement), but with another phrase appended that I did not write ... order by blah.... &$order=:id.

So it seemed that RSocrata was appending that, which is probably something it does to enable paging of simpler $select and $where clauses. But it may be triggering an error in read.socrata statement. But I am a new R user, so not ready to wade into the source code.

When I took out the order by part of mysql. there was still an error, with the &$order=:id.

I think the SoQL error message that appeared when I ran the query in the browse indicated that you can't have a $order statement appended to a $query statement, but I am not certain of that.

On Tue, Dec 18, 2018 at 2:57 PM Ben Robinson [email protected] wrote:

According to the SoQL documentation https://dev.socrata.com/docs/queries/query.html: The $query parameter allows you to combine multiple SoQL clauses together into a single parameter, for convenience. Similar to SQL, clauses must be specified in a specific order. It also identifies the SoQL $query as: "A full SoQL query string, all as one parameter."

From my understanding of the query option, its basically a combination of all the other pieces of the SODA so you would have to put the order statement into your query. So instead of saying $query = 'SELECT * WHERE city = 'New York'' for your query and then having $order=state you would have $query=SELECT * WHERE city = 'New York' ORDER BY state' as one complete query. Hope that make sense!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Chicago/RSocrata/issues/154#issuecomment-448349632, or mute the thread https://github.com/notifications/unsubscribe-auth/AZQjVziADxaamEkT6fHfw0rzXtx0HOHZks5u6UimgaJpZM4Tdntu .

-- Josh Barbanel [email protected]

sunshine52 avatar Dec 18 '18 20:12 sunshine52

The problem seems to be in the Rsocrata read.socrata function. It checks for an "$order" parameter and if it doesn't exist and there is no count query, it adds 'order=:id'. This causes the "$query" queries to fail. There may need to be an if clause or an else if clause that finds a "$query statement and leaves that url untouched.

This is 9 lines into read.socrata. if(!orderTest & !countTest) # sort by Socrata unique identifier validUrl <- paste(validUrl, if(is.null(parsedUrl$query)) {'?'} else {"&"}, '$order=:id', sep='')

sunshine52 avatar Jan 24 '19 23:01 sunshine52

Hi @sunshine52, thank you for the details!

The dev branch has a PR that should address this issue. Are you able to check if that version of the package resolves the issue? It will be included with our next CRAN release.

nicklucius avatar Jan 25 '19 22:01 nicklucius

Thanks I had no idea. I got it to work with the nightly version.

On Fri, Jan 25, 2019 at 5:10 PM Nick Lucius [email protected] wrote:

Hi @sunshine52 https://github.com/sunshine52, thank you for the details!

The dev branch has a PR that should address this issue. Are you able to check if that version of the package resolves the issue? It will be included with our next CRAN release.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Chicago/RSocrata/issues/154#issuecomment-457748671, or mute the thread https://github.com/notifications/unsubscribe-auth/AZQjV03iF_DRXrY9GHMdNIJf6wEPxWzsks5vG4DOgaJpZM4Tdntu .

-- Josh Barbanel [email protected]

sunshine52 avatar Jan 25 '19 23:01 sunshine52

Good to hear, @sunshine52. When we move the code to the master branch, this issue will close out.

nicklucius avatar Jan 26 '19 03:01 nicklucius

This still seems to be an issue, I've tried the 'nightly' dev version and still get the bad request error in R: read.socrata("https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'")

However this works in my web browser:

https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'

The read.socrata function is still pasting &$order=:id to the end of the query string.

Ben-Cox avatar Dec 21 '19 01:12 Ben-Cox

I was able to run your query in Rsocrata. sessionInfo shows RSocrata_1.8.0-10 sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 30 (Thirty) Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] reshape2_1.4.3 RMySQL_0.10.18 DBI_1.1.0 clipr_0.7.0 RSocrata_1.8.0-10 lubridate_1.7.4 [7] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3 readr_1.3.1 tidyr_1.0.0 [13] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1

read.socrata("https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'") tagcode 1 <NA> 2 <NA> 3 637092 4 <NA> 5 <NA> 6 055928 7 637027 8 637074 9 220262 10 055927...

On Fri, Dec 20, 2019 at 8:32 PM Ben-Cox [email protected] wrote:

This still seems to be an issue, I've tried the 'nightly' dev version and still get the bad request error in R: read.socrata("https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'")

However this works in my web browser:

https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'

The read.socrata function is still pasting &$order=:id to the end of the query string.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Chicago/RSocrata/issues/154?email_source=notifications&email_token=AGKCGV44K3YWZHTP3XRB24DQZVW4PA5CNFSM4E3WPNXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHOSEHY#issuecomment-568140319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGKCGV3AUALHUNIOSNW63UDQZVW4PANCNFSM4E3WPNXA .

-- Josh Barbanel [email protected]

sunshine52 avatar Dec 21 '19 23:12 sunshine52

Well its working on my machine now too, and I didn't change anything. I'll chock it up to a Christmas miracle. Thanks!

Ben-Cox avatar Dec 23 '19 17:12 Ben-Cox