opendatatoronto
opendatatoronto copied to clipboard
Strange error with get_resource for a .csv file = EOF within quoted string
First, excellent, super useful package. Thanks very much.
Second, I have hit one small snag. When I use get_resource using the following code, the .csv file ends up being parsed incorrectly.
active_building_permits <- search_packages("Active permits") %>% list_package_resources() %>% dplyr::filter(name == "Active permits (CSV)") %>% get_resource()
I have far fewer records than I should and information appears in the wrong columns. I get the following warning:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string
I know of similar things happening when directly using read.csv rather than read.table instead of read.csv because of stray quotes in the data, See https://kodlogs.com/33766/in-scan-file-file-what-what-sep-sep-quote-quote-dec-dec-eof-within-quoted-string
But I'm not sure what is happening here.
Hi, thanks for the issue!
I have actually run into this problem myself, with this exact data set! The issue is definitely with the underlying CSV - read.csv
doesn't seem to parse it properly, but readr::read_csv()
does. Unfortunately right now ckanr
(the package that opendatatoronto
uses to access the portal) uses read.csv
and not readr::read_csv()
.
I'll open an issue over on ckanr
with this - I'm the maintainer on that too so will have a think about how to handle it.
In the meantime, you can access the file more manually by using ckanr
functions and reading the CSV yourself - here is some code to do that:
library(opendatatoronto)
library(ckanr)
#> Loading required package: DBI
library(readr)
active_building_permits <- search_packages("Active permits") %>%
list_package_resources() %>% dplyr::filter(name == "Active permits (CSV)")
active_building_permits_id <- active_building_permits[["id"]]
# Get URL of resource
resource <- resource_show(active_building_permits_id, url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/", as = "list")
# Make a directory to save into
dir <- tempdir()
resource_dir <- fs::dir_create(paste0(dir, "/", active_building_permits_id))
# Save the ZIP file
save_path <- ckan_fetch(resource[["url"]], store = "disk", path = paste0(dir, "/", active_building_permits_id, "/", "res.zip"))
# Unzip it
csv_files <- unzip(save_path[["path"]], exdir = resource_dir)
# Read it in
res <- read_csv(csv_files)
#> Rows: 246434 Columns: 30
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (16): PERMIT_NUM, REVISION_NUM, PERMIT_TYPE, STRUCTURE_TYPE, WORK, STREE...
#> dbl (13): GEO_ID, APPLICATION_DATE, ISSUED_DATE, DWELLING_UNITS_CREATED, DWE...
#> lgl (1): COMPLETED_DATE
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dim(res)
#> [1] 246434 30
# Compare to via read.csv()
bad_res <- read.csv(csv_files)
#> Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
#> EOF within quoted string
dim(bad_res)
#> [1] 135718 30
Hope this is helpful in the meantime!
This works perfectly. Thanks so much for the quick fix (and the great package). Shall I close the issue since the manual code works or do you want me to leave it open as a placeholder to think about?
Great, so glad it worked for you! Let's leave it open as a placeholder - thanks!