covidregionaldata
covidregionaldata copied to clipboard
Country specific wrappers for JHU/Google data
At the moment it is a little clunky, tedious and a bit opaque to access the regional data imported using the JHU and Google wrappers. It would be really nice to improve access to this data on the same level as other data sources. One of the current issues is discoverability as we don't broadcast to users what the JHU/Google support until they access those classes and these don't work via get_available_data
or via get_regional_data
. It is also very slow to clean and processs these big data sets which is a large waste of time if only interested in a single region.
This can be done in several ways:
- Manually add the source by looking at where JHU/Google get their data from
- Import via our current integrations with JHU and Google (using a child class with more specifc defaults) and add documentation based on the original source
- Automagically import and write new clases using a script in the data-raw.
Personally, I think manually adding using the JHU/Google integrations is probably the way to go in terms of giving the best documentation and ease of use.
Tagging for discussions @epiforecasts/covidregionaldata @Bisaloo (appreciate your thoughts). I am happy to look at how to do this but it might take a while so also very happy for anyone else interested to have crack.
Example of accessing the data for a JHU supported region and Google supported region
library(covidregionaldata)
jhu <- JHU$new(level = "2")
jhu$get()
#> Downloading data from https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
#> Rows: 279 Columns: 566
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Province/State, Country/Region
#> dbl (564): Lat, Long, 1/22/20, 1/23/20, 1/24/20, 1/25/20, 1/26/20, 1/27/20, ...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Downloading data from https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv
#> Rows: 279 Columns: 566
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Province/State, Country/Region
#> dbl (564): Lat, Long, 1/22/20, 1/23/20, 1/24/20, 1/25/20, 1/26/20, 1/27/20, ...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Downloading data from https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv
#> Rows: 264 Columns: 566
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Province/State, Country/Region
#> dbl (564): Lat, Long, 1/22/20, 1/23/20, 1/24/20, 1/25/20, 1/26/20, 1/27/20, ...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Cleaning data
#> Processing data
jhu$available_regions()
#> [1] "Australia" "Canada" "China" "Denmark"
#> [5] "France" "Netherlands" "New Zealand" "United Kingdom"
jhu$filter("China")
#> Filtering data to: China
jhu$process()
#> Processing data
jhu$data
#> $raw
#> $raw$daily_confirmed
#> # A tibble: 279 × 566
#> `Province/State` `Country/Region` Lat Long `1/22/20` `1/23/20` `1/24/20`
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 <NA> Afghanistan 33.9 67.7 0 0 0
#> 2 <NA> Albania 41.2 20.2 0 0 0
#> 3 <NA> Algeria 28.0 1.66 0 0 0
#> 4 <NA> Andorra 42.5 1.52 0 0 0
#> 5 <NA> Angola -11.2 17.9 0 0 0
#> 6 <NA> Antigua and Bar… 17.1 -61.8 0 0 0
#> 7 <NA> Argentina -38.4 -63.6 0 0 0
#> 8 <NA> Armenia 40.1 45.0 0 0 0
#> 9 Australian Capit… Australia -35.5 149. 0 0 0
#> 10 New South Wales Australia -33.9 151. 0 0 0
#> # … with 269 more rows, and 559 more variables: 1/25/20 <dbl>, 1/26/20 <dbl>,
#> # 1/27/20 <dbl>, 1/28/20 <dbl>, 1/29/20 <dbl>, 1/30/20 <dbl>, 1/31/20 <dbl>,
#> # 2/1/20 <dbl>, 2/2/20 <dbl>, 2/3/20 <dbl>, 2/4/20 <dbl>, 2/5/20 <dbl>,
#> # 2/6/20 <dbl>, 2/7/20 <dbl>, 2/8/20 <dbl>, 2/9/20 <dbl>, 2/10/20 <dbl>,
#> # 2/11/20 <dbl>, 2/12/20 <dbl>, 2/13/20 <dbl>, 2/14/20 <dbl>, 2/15/20 <dbl>,
#> # 2/16/20 <dbl>, 2/17/20 <dbl>, 2/18/20 <dbl>, 2/19/20 <dbl>, 2/20/20 <dbl>,
#> # 2/21/20 <dbl>, 2/22/20 <dbl>, 2/23/20 <dbl>, 2/24/20 <dbl>, …
#>
#> $raw$daily_deaths
#> # A tibble: 279 × 566
#> `Province/State` `Country/Region` Lat Long `1/22/20` `1/23/20` `1/24/20`
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 <NA> Afghanistan 33.9 67.7 0 0 0
#> 2 <NA> Albania 41.2 20.2 0 0 0
#> 3 <NA> Algeria 28.0 1.66 0 0 0
#> 4 <NA> Andorra 42.5 1.52 0 0 0
#> 5 <NA> Angola -11.2 17.9 0 0 0
#> 6 <NA> Antigua and Bar… 17.1 -61.8 0 0 0
#> 7 <NA> Argentina -38.4 -63.6 0 0 0
#> 8 <NA> Armenia 40.1 45.0 0 0 0
#> 9 Australian Capit… Australia -35.5 149. 0 0 0
#> 10 New South Wales Australia -33.9 151. 0 0 0
#> # … with 269 more rows, and 559 more variables: 1/25/20 <dbl>, 1/26/20 <dbl>,
#> # 1/27/20 <dbl>, 1/28/20 <dbl>, 1/29/20 <dbl>, 1/30/20 <dbl>, 1/31/20 <dbl>,
#> # 2/1/20 <dbl>, 2/2/20 <dbl>, 2/3/20 <dbl>, 2/4/20 <dbl>, 2/5/20 <dbl>,
#> # 2/6/20 <dbl>, 2/7/20 <dbl>, 2/8/20 <dbl>, 2/9/20 <dbl>, 2/10/20 <dbl>,
#> # 2/11/20 <dbl>, 2/12/20 <dbl>, 2/13/20 <dbl>, 2/14/20 <dbl>, 2/15/20 <dbl>,
#> # 2/16/20 <dbl>, 2/17/20 <dbl>, 2/18/20 <dbl>, 2/19/20 <dbl>, 2/20/20 <dbl>,
#> # 2/21/20 <dbl>, 2/22/20 <dbl>, 2/23/20 <dbl>, 2/24/20 <dbl>, …
#>
#> $raw$daily_recovered
#> # A tibble: 264 × 566
#> `Province/State` `Country/Region` Lat Long `1/22/20` `1/23/20` `1/24/20`
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 <NA> Afghanistan 33.9 67.7 0 0 0
#> 2 <NA> Albania 41.2 20.2 0 0 0
#> 3 <NA> Algeria 28.0 1.66 0 0 0
#> 4 <NA> Andorra 42.5 1.52 0 0 0
#> 5 <NA> Angola -11.2 17.9 0 0 0
#> 6 <NA> Antigua and Bar… 17.1 -61.8 0 0 0
#> 7 <NA> Argentina -38.4 -63.6 0 0 0
#> 8 <NA> Armenia 40.1 45.0 0 0 0
#> 9 Australian Capit… Australia -35.5 149. 0 0 0
#> 10 New South Wales Australia -33.9 151. 0 0 0
#> # … with 254 more rows, and 559 more variables: 1/25/20 <dbl>, 1/26/20 <dbl>,
#> # 1/27/20 <dbl>, 1/28/20 <dbl>, 1/29/20 <dbl>, 1/30/20 <dbl>, 1/31/20 <dbl>,
#> # 2/1/20 <dbl>, 2/2/20 <dbl>, 2/3/20 <dbl>, 2/4/20 <dbl>, 2/5/20 <dbl>,
#> # 2/6/20 <dbl>, 2/7/20 <dbl>, 2/8/20 <dbl>, 2/9/20 <dbl>, 2/10/20 <dbl>,
#> # 2/11/20 <dbl>, 2/12/20 <dbl>, 2/13/20 <dbl>, 2/14/20 <dbl>, 2/15/20 <dbl>,
#> # 2/16/20 <dbl>, 2/17/20 <dbl>, 2/18/20 <dbl>, 2/19/20 <dbl>, 2/20/20 <dbl>,
#> # 2/21/20 <dbl>, 2/22/20 <dbl>, 2/23/20 <dbl>, 2/24/20 <dbl>, …
#>
#>
#> $clean
#> # A tibble: 160,170 × 10
#> date level_1_region level_1_region_code level_2_region level_2_region_…
#> <date> <chr> <chr> <chr> <dbl>
#> 1 2020-01-22 Afghanistan AFG <NA> NA
#> 2 2020-01-23 Afghanistan AFG <NA> NA
#> 3 2020-01-24 Afghanistan AFG <NA> NA
#> 4 2020-01-25 Afghanistan AFG <NA> NA
#> 5 2020-01-26 Afghanistan AFG <NA> NA
#> 6 2020-01-27 Afghanistan AFG <NA> NA
#> 7 2020-01-28 Afghanistan AFG <NA> NA
#> 8 2020-01-29 Afghanistan AFG <NA> NA
#> 9 2020-01-30 Afghanistan AFG <NA> NA
#> 10 2020-01-31 Afghanistan AFG <NA> NA
#> # … with 160,160 more rows, and 5 more variables: cases_total <dbl>,
#> # deaths_total <dbl>, recovered_total <dbl>, Lat <dbl>, Long <dbl>
#>
#> $filtered
#> # A tibble: 20,232 × 10
#> date level_1_region level_1_region_code level_2_region level_2_region_…
#> <date> <chr> <chr> <chr> <dbl>
#> 1 2020-01-22 China CHN Anhui NA
#> 2 2020-01-23 China CHN Anhui NA
#> 3 2020-01-24 China CHN Anhui NA
#> 4 2020-01-25 China CHN Anhui NA
#> 5 2020-01-26 China CHN Anhui NA
#> 6 2020-01-27 China CHN Anhui NA
#> 7 2020-01-28 China CHN Anhui NA
#> 8 2020-01-29 China CHN Anhui NA
#> 9 2020-01-30 China CHN Anhui NA
#> 10 2020-01-31 China CHN Anhui NA
#> # … with 20,222 more rows, and 5 more variables: cases_total <dbl>,
#> # deaths_total <dbl>, recovered_total <dbl>, Lat <dbl>, Long <dbl>
#>
#> $processed
#> # A tibble: 20,232 × 17
#> date country iso_3166_1_alpha_3 region iso_code cases_new cases_total
#> <date> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 2020-01-22 China CHN Anhui NA 1 1
#> 2 2020-01-22 China CHN Beijing NA 14 14
#> 3 2020-01-22 China CHN Chongqing NA 6 6
#> 4 2020-01-22 China CHN Fujian NA 1 1
#> 5 2020-01-22 China CHN Gansu NA 0 0
#> 6 2020-01-22 China CHN Guangdong NA 26 26
#> 7 2020-01-22 China CHN Guangxi NA 2 2
#> 8 2020-01-22 China CHN Guizhou NA 1 1
#> 9 2020-01-22 China CHN Hainan NA 4 4
#> 10 2020-01-22 China CHN Hebei NA 1 1
#> # … with 20,222 more rows, and 10 more variables: deaths_new <dbl>,
#> # deaths_total <dbl>, recovered_new <dbl>, recovered_total <dbl>,
#> # hosp_new <dbl>, hosp_total <dbl>, tested_new <dbl>, tested_total <dbl>,
#> # Lat <dbl>, Long <dbl>
google <- Google$new(level = "2", get = TRUE)
#> Downloading data from https://storage.googleapis.com/covid19-open-data/v2/epidemiology.csv
#> Rows: 7534538 Columns: 10
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): key
#> dbl (8): new_confirmed, new_deceased, new_recovered, new_tested, total_conf...
#> date (1): date
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Downloading data from https://storage.googleapis.com/covid19-open-data/v2/hospitalizations.csv
#> Rows: 1003422 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): key
#> dbl (9): new_hospitalized, total_hospitalized, current_hospitalized, new_in...
#> date (1): date
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Downloading data from https://storage.googleapis.com/covid19-open-data/v2/index.csv
#> Rows: 22578 Columns: 15
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (14): key, place_id, wikidata, datacommons, country_code, country_name, ...
#> dbl (1): aggregation_level
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Cleaning data
#> Processing data
google$available_regions()
#> [1] "Switzerland" "Argentina"
#> [3] "Brazil" "Spain"
#> [5] "Germany" "France"
#> [7] "Indonesia" "Thailand"
#> [9] "United States of America" "Japan"
#> [11] "South Korea" "China"
#> [13] "Ukraine" "Philippines"
#> [15] "Australia" "Canada"
#> [17] "Taiwan" "United Kingdom"
#> [19] "Sweden" "Estonia"
#> [21] "Mexico" "Italy"
#> [23] "Austria" "Pakistan"
#> [25] "Portugal" "Belgium"
#> [27] "Czech Republic" "Chile"
#> [29] "Peru" "Colombia"
#> [31] "Israel" "Netherlands"
#> [33] "India" "Poland"
#> [35] "Haiti" "Norway"
#> [37] "Afghanistan" "Mozambique"
#> [39] "Russia" "South Africa"
#> [41] "Sierra Leone" "Romania"
#> [43] "Democratic Republic of the Congo" "Venezuela"
#> [45] "Sudan" "Kenya"
#> [47] "Bangladesh" "Libya"
google$filter("portugal")
#> Filtering data to: Portugal
google$process()
#> Processing data
Created on 2021-08-06 by the reprex package (v2.0.0)
Example for a fully supported country:
library(covidregionaldata)
italy <- Italy$new(get = TRUE)
#> Downloading data from https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv
#> Rows: 11109 Columns: 30
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (8): stato, codice_regione, denominazione_regione, note, note_test, no...
#> dbl (21): lat, long, ricoverati_con_sintomi, terapia_intensiva, totale_ospe...
#> dttm (1): data
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Cleaning data
#> Processing data
italy$available_regions()
#> [1] "Abruzzo" "Basilicata" "Calabria"
#> [4] "Campania" "Emilia-Romagna" "Friuli Venezia Giulia"
#> [7] "Lazio" "Liguria" "Lombardia"
#> [10] "Marche" "Molise" "Piemonte"
#> [13] "Puglia" "Sardegna" "Sicilia"
#> [16] "Toscana" "Trentino-Alto Adige" "Umbria"
#> [19] "Valle d'Aosta" "Veneto"
italy$supported_levels
#> [[1]]
#> [1] "1"
italy$data
#> $raw
#> $raw$main
#> # A tibble: 11,109 × 30
#> data stato codice_regione denominazione_regione lat long
#> <dttm> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2020-02-24 18:00:00 ITA 13 Abruzzo 42.4 13.4
#> 2 2020-02-24 18:00:00 ITA 17 Basilicata 40.6 15.8
#> 3 2020-02-24 18:00:00 ITA 18 Calabria 38.9 16.6
#> 4 2020-02-24 18:00:00 ITA 15 Campania 40.8 14.3
#> 5 2020-02-24 18:00:00 ITA 08 Emilia-Romagna 44.5 11.3
#> 6 2020-02-24 18:00:00 ITA 06 Friuli Venezia Giulia 45.6 13.8
#> 7 2020-02-24 18:00:00 ITA 12 Lazio 41.9 12.5
#> 8 2020-02-24 18:00:00 ITA 07 Liguria 44.4 8.93
#> 9 2020-02-24 18:00:00 ITA 03 Lombardia 45.5 9.19
#> 10 2020-02-24 18:00:00 ITA 11 Marche 43.6 13.5
#> # … with 11,099 more rows, and 24 more variables: ricoverati_con_sintomi <dbl>,
#> # terapia_intensiva <dbl>, totale_ospedalizzati <dbl>,
#> # isolamento_domiciliare <dbl>, totale_positivi <dbl>,
#> # variazione_totale_positivi <dbl>, nuovi_positivi <dbl>,
#> # dimessi_guariti <dbl>, deceduti <dbl>, casi_da_sospetto_diagnostico <dbl>,
#> # casi_da_screening <dbl>, totale_casi <dbl>, tamponi <dbl>,
#> # casi_testati <dbl>, note <chr>, ingressi_terapia_intensiva <dbl>, …
#>
#>
#> $clean
#> # A tibble: 10,580 × 6
#> date level_1_region level_1_region_code cases_total deaths_total
#> <date> <chr> <chr> <dbl> <dbl>
#> 1 2020-02-24 Abruzzo IT-65 0 0
#> 2 2020-02-24 Basilicata IT-77 0 0
#> 3 2020-02-24 Calabria IT-78 0 0
#> 4 2020-02-24 Campania IT-72 0 0
#> 5 2020-02-24 Emilia-Romagna IT-45 18 0
#> 6 2020-02-24 Friuli Venezia Giulia IT-36 0 0
#> 7 2020-02-24 Lazio IT-62 3 0
#> 8 2020-02-24 Liguria IT-42 0 0
#> 9 2020-02-24 Lombardia IT-25 172 6
#> 10 2020-02-24 Marche IT-57 0 0
#> # … with 10,570 more rows, and 1 more variable: tested_total <dbl>
#>
#> $filtered
#> # A tibble: 10,580 × 6
#> date level_1_region level_1_region_code cases_total deaths_total
#> <date> <chr> <chr> <dbl> <dbl>
#> 1 2020-02-24 Abruzzo IT-65 0 0
#> 2 2020-02-24 Basilicata IT-77 0 0
#> 3 2020-02-24 Calabria IT-78 0 0
#> 4 2020-02-24 Campania IT-72 0 0
#> 5 2020-02-24 Emilia-Romagna IT-45 18 0
#> 6 2020-02-24 Friuli Venezia Giulia IT-36 0 0
#> 7 2020-02-24 Lazio IT-62 3 0
#> 8 2020-02-24 Liguria IT-42 0 0
#> 9 2020-02-24 Lombardia IT-25 172 6
#> 10 2020-02-24 Marche IT-57 0 0
#> # … with 10,570 more rows, and 1 more variable: tested_total <dbl>
#>
#> $processed
#> # A tibble: 10,580 × 13
#> date regioni iso_3166_2 cases_new cases_total deaths_new deaths_total
#> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2020-02-24 Abruzzo IT-65 0 0 0 0
#> 2 2020-02-24 Basilica… IT-77 0 0 0 0
#> 3 2020-02-24 Calabria IT-78 0 0 0 0
#> 4 2020-02-24 Campania IT-72 0 0 0 0
#> 5 2020-02-24 Emilia-R… IT-45 18 18 0 0
#> 6 2020-02-24 Friuli V… IT-36 0 0 0 0
#> 7 2020-02-24 Lazio IT-62 3 3 0 0
#> 8 2020-02-24 Liguria IT-42 0 0 0 0
#> 9 2020-02-24 Lombardia IT-25 172 172 6 6
#> 10 2020-02-24 Marche IT-57 0 0 0 0
#> # … with 10,570 more rows, and 6 more variables: recovered_new <dbl>,
#> # recovered_total <dbl>, hosp_new <dbl>, hosp_total <dbl>, tested_new <dbl>,
#> # tested_total <dbl>
Created on 2021-08-06 by the reprex package (v2.0.0)