covid-dashboard icon indicating copy to clipboard operation
covid-dashboard copied to clipboard

Process data at a finer spatial granularity

Open GaelVaroquaux opened this issue 4 years ago • 10 comments

The data from John Hopkins comes at the level of regions / province. We should ideally build the map at this level. Forecasting at this level would also be interesting, provided that there are enough cases (forecasting from few cases is unreliable).

This enhancement will require some work, but it seems a worthwhile addition to the site.

GaelVaroquaux avatar Mar 18 '20 02:03 GaelVaroquaux

I think that the challenge is plotting on the map: we need to get the shape of each region / province. Maybe it's a geojson?

Here is a discussion on US states in a world map: https://community.plot.ly/t/state-boundaries-on-a-world-map-projection/11698/4

Based on the following documentation, the only predefined geometries are the countries and the US states: https://plot.ly/python/choropleth-maps/#using-builtin-country-and-state-geometries

GaelVaroquaux avatar Mar 20 '20 10:03 GaelVaroquaux

Currently taking a look at http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ @jorisvandenbossche do you think it's a good resource or would you rather recommend another resource ? (sorry for the ping!)

emmanuelle avatar Mar 23 '20 17:03 emmanuelle

if we want a quick solution, what could be done would be to use the Lat / Lon info of the dataset to plot a scatter plot at each lat / lon tuple. no need for shape files there. Of course having the shapes is nicer but it's also more data for the whole page

emmanuelle avatar Mar 23 '20 18:03 emmanuelle

Natural Earth indeed has States/Provinces, but the question will still be if that matches the regions as provided in the data. Is there an example of the data?

jorisvandenbossche avatar Mar 23 '20 19:03 jorisvandenbossche

Thanks a lot for your input @jorisvandenbossche :-). An example of dataset is https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

emmanuelle avatar Mar 23 '20 20:03 emmanuelle

Thanks for that link. So https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/ has states and provinces shapes. I can take a look tomorrow if it is relatively straightforward to match those. But eg the COVID data for the US even come per county, not per state (although it should be easy to aggregate those per state)

jorisvandenbossche avatar Mar 23 '20 20:03 jorisvandenbossche

Thanks for taking a look. The Johns Hopkins dataset (which we are using at the moment) only has province / state information for a handful of countries (it might change in the future)

>>> countries = df['Country/Region']
>>> countries.value_counts()[:50]                                                                                               
US                     247
China                   33
Canada                  12
France                   9
Australia                9
United Kingdom           7
Netherlands              4
Denmark                  3
Japan                    1

So for now this correspondance must be checked for 8 countries. I'll try it with the US states (county-level information is great but state-level should be fine for now), since plotly's choropleth trace already knows the geometry of US states.

emmanuelle avatar Mar 23 '20 20:03 emmanuelle

Related to this: #79 . We can assume that county-based data are incomplete or not reliable, so let's not use them and focus on state-level data for the US.

emmanuelle avatar Mar 24 '20 07:03 emmanuelle

Also see https://github.com/CSSEGISandData/COVID-19/issues/1250

emmanuelle avatar Mar 24 '20 08:03 emmanuelle

In fact, regions info are only useful for Canada, Australia and China. For the other countries, regions correspond to overseas territories.

emmanuelle avatar Mar 24 '20 08:03 emmanuelle