eeweather
eeweather copied to clipboard
Expansion of weather stations included
In exploring the functionality, it seems that the majority of the stations included are in the U.S. and Australia. It would be great if data for weather stations in other regions of the world could be included. I am particularly interested in the UK and other portions of Europe but it would be great to explore the possibilities of expansion to cover as much of the world as possible.
The weather data is sourced from NOAA (although the website is unfortunately currently down (ftp://ftp.ncdc.noaa.gov/pub/data/noaa)) which has access to stations globally. What we do is build a database that only contains certain weather stations (using the eeweather rebuild_db
command. This command limits to US and Australian stations, as you can see here: https://github.com/openeemeter/eeweather/blob/master/eeweather/database.py#L186
The ``eeweather rebuild_db` command stores a sqlite3 db file that contains only the stations that were filtered by that line above. If you would like to expand the stations, you could play around with adjusting that line or possibly adding a cli parameter to the rebuild_db call in order to allow countries to be passed in as a parameter. It may be tough to do right now that NOAA is down, but it should hopefully be back up in the next few days. If you get it working, please consider submitting it as a pull request!
Let us know if you need any help navigating the code.
@DLDonaldson @ssuffian I did a quick experiment to see how big the database of metadata gets when it includes all of the weather stations in the world. It looks like it increases in size from 11.7Mb to 25Mb, which is actually probably reasonable, and which is still well below the PyPI package size limit, which would be our upper bound. It's a little bit big for a python package, and we could probably do work to slim it down a bit or separate it out from the library itself, but I think I could be convinced to move to world-wide support. When I get a chance I will create a branch or tag with that change so that we can test it out in practice.
@philngo I'm trying to expand the list of weather stations to include NZ and Canada (worldwide as mentioned in this issue would be great but I wanted to just start with what I need). I managed to get it almost working with the following changes:
- delete
eeweather/eeweather/resources/metadata.db
- Change
database.py
as shown in this diff:
diff --git a/eeweather/database.py b/eeweather/database.py
index 8466f9f..68e406b 100644
--- a/eeweather/database.py
+++ b/eeweather/database.py
@@ -181,9 +181,11 @@ def _load_isd_station_metadata(download_path):
)
isAus = isd_history.CTRY == "AS"
+ isCan = isd_history.CTRY == "CA"
+ isNZ = isd_history.CTRY == "NZ"
metadata = {}
- for usaf_station, group in isd_history[hasGEO & hasUSAF & (isUS | isAus)].groupby("USAF"):
+ for usaf_station, group in isd_history[hasGEO & hasUSAF & (isUS | isAus | isCan | isNZ)].groupby("USAF"):
# find most recent
recent = group.loc[group.END.idxmax()]
wban_stations = list(group.WBAN)
- Find a new source for the
CA_Building_Standards_Climate_Zones.zip
file, which is missing from the official ca.gov site. I know the place I found it probably isn't a long term solution to this problem, but I couldn't rebuild the database without this file.
diff --git a/scripts/create_ca_climate_zone_geojson.sh b/scripts/create_ca_climate_zone_geojson.sh
index 145711f..732408b 100755
--- a/scripts/create_ca_climate_zone_geojson.sh
+++ b/scripts/create_ca_climate_zone_geojson.sh
@@ -5,7 +5,7 @@ DATA_DIR=${1:-data}
mkdir -p $DATA_DIR
# download and install CA climate zone raw data
-wget -N http://ww2.energy.ca.gov/maps/renewable/CA_Building_Standards_Climate_Zones.zip -P $DATA_DIR -q --show-progress
+wget -N https://community.esri.com/servlet/JiveServlet/download/176380-1-158805/CA_Building_Standards_Climate_Zones.zip -P $DATA_DIR -q --show-progress
unzip -q -o $DATA_DIR/CA_Building_Standards_Climate_Zones.zip -d $DATA_DIR
# reproject to ESRI Shapefile
- After those changes I ran the
eeweather rebuild-db
command from inside the shell of my docker image and it worked. I am able to get weather stations and weather data for NZ and CA.
In doing all these changes I somehow broke the ability to use the is_tmy3=True
parameter in eeweather.rank_stations()
anymore (even when I am looking for stations in the US). If I pass is_tmy3=True
the response is (None, [EEWeatherWarning(qualified_name=eeweather.no_weather_station_selected)])
regardless of where I look for a weather station. Is there some step that I overlooked when rebuilding the database that might have broken this?
@philngo I did some work earlier this year to slim down the number of stations worldwide based on the duration of the history and the amount of data available for each station. That might be a good way to reduce the overall number of stations in moving to worldwide support if you want to filter it down somewhat. If we were to expand the worldwide coverage that might simultaneously address the issue raised by @bhough199.
Also perhaps the TMY3 problem is a result of #63.
@DLDonaldson Worldwide coverage is definitely something I am interested in pursuing, but I'll need some support to move it forward. I had considered at one point making a download step that downloads the whole database, or which ever part of the database that was necessary for your task - which would decouple it from the PyPI release schedule. Filtering things down also seems like a pretty reasonable approach.
@bhough199 Thanks for sharing what you did to get the rebuilding working again - that will help other power users figure out how to rebuild from source. There was a step in the database building which I think scraped the old NREL site for the TMY3 station metadata, it's possible that that is also now broken. Let us know if #65 fixes your issue.
@philngo I tested with the newest release after you fixed #65 but I am unfortunately still getting the same problem with eeweather.rank_stations()
.
Also in case anyone else is trying my approach, the link to the CA_Building_Standards_Climate_Zones.zip
file that I showed in my earlier comment has changed (I knew it wasn't a reliable link, and suggest maybe you should host this file in the same place you put the TMY3 weather data since it is not available from the official source anymore?). The new link I found today is https://community.esri.com/ccqpr47374/attachments/ccqpr47374/coordinate-reference-systemsforum-board/1814/1/CA_Building_Standards_Climate_Zones.zip
@bhough199 Would you mind opening a new issue for this out-of-date source problem so we can track that separately from weather station expansion? I think it is a good idea to host our own version of the source files to prevent against this happening again - perhaps we can track that work in that new issue. Would appreciate help tracking down any other sources that are out of date, if you find any. One of these that may help solve the current rank_stations issue is this one (untested) which I believe you should be able to use from archive.org in the rebuilding step: http://web.archive.org/web/20181119091712/https://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/tmy3/by_USAFN.html