cartopy icon indicating copy to clipboard operation
cartopy copied to clipboard

Clarifying the 'under the hood' data download (and other GIS stuff)

Open jypeter opened this issue 5 years ago • 19 comments

I have just used ax_plot.add_feature(cfeature.LAND) for the first time, which triggered an external data download

/home/share/unix_files/cdat/miniconda3/envs/cdatm_py2/lib/python2.7/site-packages/cartopy/io/__init__.py:260: DownloadWarning: Downloading: http://naciscdn.org/naturalearth/110m/physical/ne_110m_land.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)

This went quite well, but it reminded me that such a download recently failed when using cartopy on a supercomputer with no outside world access. I was in a hurry and did not write down the error message, so I can't say if it was friendly and informative or not (I figured out what it was about, and a workaround, but you want to make this easy for all users)

Anyway, I think it would be nice to add a page somewhere about:

  • which data is available by default
  • what might be downloaded and where (in my case, the data ended up in cartopy.config['data_dir']'/shapefiles/natural_earth/physical', more precisely in a hidden directory ~/.local/share/cartopy/shapefiles/natural_earth/physical)
  • some recipe to pre-download the data on a computer with network access, and copy it to the appropriate directory on a supercomputer
  • a short summary (or links to appropriate external web sites) of how this shapefiles/features/layers/WMS stuff work, to help modellers that are not familiar with the GIS stuff

And you could then add a link to this new page from The cartopy Feature interface page and other relevant places

jypeter avatar Jun 14 '19 09:06 jypeter

I second this. We plan to use cartopy for a flight planning tool used in remote locations with bad internet connection and want to automatically install it and the necessary data fully before going abroad.

joernu76 avatar Jun 24 '19 15:06 joernu76

@jypeter I just encountered the same issue working on a supercomputer. Did you happen to figure out a workaround for this?

cpatrizio88 avatar Jul 29 '19 18:07 cpatrizio88

+1 for this. I recently tried to guide a supercomputer user through this issue by debugging and picking apart the various functions. I missed some steps, confusing the situation

So a thorough guide would be very helpful

trexfeathers avatar Jul 30 '19 08:07 trexfeathers

I had completely forgotten about that question until you guys asked about it yesterday, and I also had to plot a map on my laptop during a conference and it triggered the download on my laptop (WiFi was fortunately working). I have just tried to find out what was downloaded:

  • nothing new was installed in the anaconda Python installation (even though I had write access to it)
  • the extra files below were installed in the .local directory of my home directory

(cdatm_py2) jypeter@lsce4078:~$ ls -ltrRh ~/.local/share/cartopy/shapefiles /home/jypeter/.local/share/cartopy/shapefiles: total 0 drwxrwxrwx 1 jypeter jypeter 512 Jul 29 16:09 natural_earth

/home/jypeter/.local/share/cartopy/shapefiles/natural_earth: total 0 drwxrwxrwx 1 jypeter jypeter 512 Jul 29 16:10 physical

/home/jypeter/.local/share/cartopy/shapefiles/natural_earth/physical: total 96K -rw-rw-rw- 1 jypeter jypeter 88K Jul 29 16:10 ne_110m_coastline.shp -rw-rw-rw- 1 jypeter jypeter 3.7K Jul 29 16:10 ne_110m_coastline.dbf -rw-rw-rw- 1 jypeter jypeter 1.2K Jul 29 16:10 ne_110m_coastline.shx

So the workaround would probably be to copy those files from a computer with network access to the .local directory of all the users who need them on a supercomputer. Can somebody give this a try?

I'd rather have a solution where cartopy first checks if the data files are available in a 'conda installation' centralized location, and then checks the ~/.local directory

@bjlittle any ideas here?

jypeter avatar Jul 30 '19 09:07 jypeter

@jypeter I have all of the files in ~/.local/share/cartopy/shapefiles/natural_earth/physical

and still no luck unfortunately. I've verified that my cartopy.config['data_dir'] is pointing to that directory as well.

cpatrizio88 avatar Jul 30 '19 17:07 cpatrizio88

#1072 is relevant here too. Looks like they had some success with the above method, but it's not working for me...

cpatrizio88 avatar Jul 30 '19 17:07 cpatrizio88

@cpatrizio88 maybe you can try the download tool mentioned in Location of stored offline data for cartopy

We probably have to play with this, and possibly use pre_existing_data_dir for having multiple users point to the same data location.

However, when installing cartopy with conda, I'm not too sure how to initialize cleanly pre_existing_data_dir for everybody using this conda install without overwriting source code installed by conda, even after reading the cartopy.config documentation

I still think there should be a documentation page on the cartopy site connecting all the dots for this. Or maybe it is there and I have not found it yet

jypeter avatar Aug 08 '19 09:08 jypeter

@jypeter thanks for your suggestion. I got this working using the following steps:

  1. Make sure cartopy.config['data_dir'] = '~.local/share/cartopy'

  2. Place the following files in ~.local/share/cartopy/shapefiles/natural_earth/physical/:

ne_110m_coastline.dbf ne_110m_coastline.shp ne_110m_coastline.shx ne_110m_land.dbf ne_110m_land.shp ne_110m_land.shx

And that's all! This will allow m.add_feature(cart.feature.LAND) and m.add_feature(cart.feature.COASTLINE) to work offline. I'm sure the steps are similar for other cartopy features.

Just a note that setting the cartopy.config['data_dir'] to exactly where the shape files are being stored did not work for me. This made the problem more difficult than necessary I think.

Your point about having multiple users point to the same data directory is well taken though. I was just happy to get it working for myself for now.

cpatrizio88 avatar Aug 09 '19 19:08 cpatrizio88

Just here to say that setting cartopy.config['data_dir'] does work fine for me. I'm doing this with a Docker build, only downloading the 10m coastlines. Here's a snippet of my Dockerfile:

# Download some NaturalEarth data for cartopy
ENV CARTOPY_DIR=/usr/local/cartopy-data
ENV NE_PHYSICAL=${CARTOPY_DIR}/shapefiles/natural_earth/physical
RUN mkdir -p ${NE_PHYSICAL}
RUN wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_coastline.zip -P ${CARTOPY_DIR}
RUN apt-get -yq install unzip
RUN unzip ${CARTOPY_DIR}/ne_10m_coastline.zip -d  ${NE_PHYSICAL}
RUN rm ${CARTOPY_DIR}/*.zip

And then in Python:

import os
import cartopy
cartopy.config['data_dir'] = os.getenv('CARTOPY_DIR', cartopy.config.get('data_dir'))

And then the repeated download is avoided (except for the initial one when the image is built).

alpha-beta-soup avatar Sep 02 '19 22:09 alpha-beta-soup

The above instructions haven't worked for me and this is a problem. I have several big customers who absolutely can't connect to the internet and need mapping features.

They have the maps in their .local/share/cartopy/... directory, I've verified that. I've tried setting the pre_existing_data_dir and that didn't seem to do anything.

And I second jypeter's suggestion that the documentation on this is lacking.

Here's their stack trace, there might be formatting strangeness, I had convert a pdf of a scan to text...

`/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/io/init.py:260: DownloadWarning: Downloading: http://naciscdn.org/naturalearth/110m/physical/ne_110m_ocean.zip warnings.warn('Downloading: {}'.format(url), DownloadWarning) Traceback (most recent call last):

File "/opt/anaconda/5.3.0/lib/python3.6/urllib/request.py", line 1318, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "/opt/anaconda/5.3.0/lib/python3.6/http/client.py", line 1239, in request self._send_reguest(method, url, body, headers, encode_chunked) File "/opt/anaconda/5.3.0/lib/python3.6/http/client.py", line 1285, in send_request self.endheaders(body, encode_chunked=encode_chunked) File "/opt/anaconda/5.3.0/lib/python3.6/http/client.py", line 1234, in endheaders self. send output(message body, encode chunked=encode chunked) File "/opt/anaconda/5.3.0/lib/python3.6/http/client.py" line 1026, in _send_output self.send(msg) File "/opt/anaconda/5.3.0/lib/python3.6/http/client.py" line 964, in send self.connect{) File "/opt/anaconda/5.3.0/lib/python3.6/http/client.py", line 936, in connect (self.host,self.port), self.timeout, self.source_address) File "/opt/anaconda/5.3.0/lib/python3.6/socket.py", line 704, in create connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): File "/opt/anaconda/5.3.0/lib/python3.6/socket.py", line 745, in getaddrinfo for res in socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/myWidget.py", line 240, in update_detections self.clear_plot(} File "/myWidget.py", line 226, in clear_plot self.fig.canvas.draw() File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 437, in draw self.figure.draw(self.renderer) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/artist.py", line 55, in draw_wrapper return draw{artist, renderer, *args, **kwargs) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/figure.py", line 1493, in draw renderer, self, artists, self.suppressComposite) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/image.py", line 141, in _draw_list_compositing_images a.draw(renderer) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/artist.py", line 55, in draw_wrapper return draw(artist, renderer, *args, **kwargs) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/mpl/geoaxes.py", line 385, in draw inframe=inframe) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/artist.py", line 55, in draw_wrapper return draw(artist, renderer, *args, **kwargs) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 2635, in draw mimage._draw_list_compositing_images(renderer, self, artists) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/image.py", line 141, in _draw_list_compositing_images a.draw(renderer) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/matplotlib/artist.py", line 55, in draw_wrapper return draw(artist, renderer, *args, **kwargs) File "/opt/anaconcla/5.3.0/lib/python3.6/site-packages/cartopy/mpl/feature_artist.py", line 137, in draw geoms "'self._feature.intersecting_geometries(extent) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/feature.py", line 120, in intersecting_geometries return (geom for geom in self.geometries() if File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/feature.py", line 191, in geometries name=self.name) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/io/shapereader.py", line 265, in natural_earth return ne_downloader.path(format_dict) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/io/init.py", line 222, in path result_path = self.acquire_resource(target_path, format_dict) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/io/shapereader.py”, line 320, in acquire resource shapefile_online = self.urlopen(url) File "/opt/anaconda/5.3.0/lib/python3.6/site-packages/cartopy/io/init.py”, line 261, in _urlopen return urlopen(url) File "/opt/anaconda/5.3.0/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/opt/anaconda/5. 3. o/lib/python3. 6/urllib/request .py", line 526, in open response·= self._open(req, data) File "/opt/anaconda/5.3.0/lib/python3.6/urllib/request.py", line 544, in _open '_open', req) File "/opt/anaconda/5.3.0/lib/python3.6/urllib/request.py", line 504, in _call_chain result= func(*args) File "/opt/anaconda/5.3.0/lib/python3.6/urllib/request.py", line 1346, in http_open return self.do_open(http.client.HTTPConnection, req) File "/opt/anaconda/5.3.0/lib/python3.6/urllib/request.py", line 1320, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno -2] Name or service not known> Abort (core dumped) `

SpacelySpaceSprockets avatar Sep 10 '19 17:09 SpacelySpaceSprockets

Note that if the 'pre_existing_data_dir' config key value is set in ../cartopy/init.py, then that will be the location cartopy looks for first. if you set that key value for a non-networked machine, then it should prevent attempting to download, as long as the shapefiles cartopy is looking for are copied to that directory. You can of course in-line set it on each script run with cartopy.config['pre_existing_data_dir'] = path. I personally had issues getting this to work with the conda-forge build on windows, documented in #1435. Using Christoph Golhke's build for windows, it works fine. I don't have access to a non-windows machine to test, but given my recent difficulties with this same issue but on Windows, something tells me there's a conda recipe issue somewhere along the way.

pzsamscore avatar Jan 08 '20 15:01 pzsamscore

Hello guys, today go through the same problem with you in an application deployment that contained Cartopy on a server. I tried to download the .zip file and unzip it in the respective directory (as I had seen in slacks), however I was not successful. As another attempt I accessed the files present in the same directory on my machine (which is a Windows), and the files present in that folder I copied to the directory, and oddly enough it worked for me.

"ne_110m_coastline.dbf, ne_110m_coastline, ne_110m_coastline.shx, ne_110m_land.dbf, ne_110m_land, ne_110m_land.shx, ne_110m_ocean.dbf, ne_110m_ocean e ne_110m_ocean.shx."

So, if you want to test, open the directory in Windows and copy the files to the respective application folder you want, to gain access to the files offline.

Anderson3 avatar May 15 '20 03:05 Anderson3

Hello, I had the same issue in a cluster with no outside network connection. The best solution for me is to download data using tools/feature_download.py script, copy it in a folder (lib/python3.7/site-packages/Cartopy-0.18.0-py3.7-linux-x86_64.egg/cartopy/data in my case) and then modify __init__.py adding that folder.

config = {'pre_existing_data_dir': 'YOUR_FOLDER',
          'data_dir': _data_dir,
          'repo_data_dir': os.path.join(os.path.dirname(__file__), 'data'),
          'downloaders': {},
          }

Then all users of the Cartopy package can plot their maps without problems.

kserradell avatar Jul 16 '20 09:07 kserradell

We have developed a workaround for supplying the map files for systems that cannot download the files, however it looks like webpage that supplies these images has been down all day: https://naciscdn.org It would be nice if cartopy could be obtained with these files included so we don't have to rely on an internet connection and this webpage being up to get everything we need to run.

Update: I learned of the script that is provided with the cartopy source code to obtain the maps. I also found the new location of the files that are automatically downloaded by cartopy: https://naturalearth.s3.amazonaws.com i.e. https://naturalearth.s3.amazonaws.com/110m_cultural/110m_cultural.zip

Will a patch be issued to cartopy to update these URLs?

georgemccabe avatar Aug 23 '21 23:08 georgemccabe

@georgemccabe Not sure whether this S3 bucket is meant to replace the existing CDN - I think it's just an alternate source. That being said, my bet is always that S3 will be more reliable than just about any other source, so I'm updating my build systems to drop in this custom cartopy config:

_SOURCE_TEMPLATE = 'https://naturalearth.s3.amazonaws.com/{resolution}_{category}/ne_{resolution}_{name}.zip'

def update_config(config):
    """Configures cartopy to download NaturalEarth shapefiles from S3 instead
    of naciscdn."""
    from cartopy.io.shapereader import NEShpDownloader
    target_path_template = NEShpDownloader.default_downloader().target_path_template
    downloader = NEShpDownloader(url_template=_SOURCE_TEMPLATE,
                                 target_path_template=target_path_template)
    config['downloaders'][('shapefiles', 'natural_earth')] = downloader

My deployment method:

usersitedir=$(python -c 'from __future__ import print_function; import site; print(site.getusersitepackages())')
mkdir -p "$usersitedir"/cartopy_userconfig
cp THE_PYTHON_SCRIPT_ABOVE.py "$usersitedir"/cartopy_userconfig/__init__.py

acarapetis avatar Aug 24 '21 05:08 acarapetis

@acarapetis, I believe it is meant to replace it reading through this issue: https://github.com/nvkelso/natural-earth-vector/issues/445 It looks like it has been approved and merged on the AWS side. https://github.com/awslabs/open-data-registry/pull/853 It would be great if someone wants to submit a PR to update the download scripts to point to the new URL.

greglucas avatar Aug 24 '21 14:08 greglucas

Thanks for an excellent library! I had a question w.r.t. downloading the ne_shaded vhigh raster. When I run this command python cartopy_feature_download.py physical --output ., I only seem to download shapefiles and not any rasters. How do I get those? They do not seem to be downloaded automatically.

ritviksahajpal avatar Feb 21 '22 02:02 ritviksahajpal

Related to this issue, the feature download script help is wrong about the default location:

https://github.com/SciTools/cartopy/blob/c184eadc1dfc458fd102e668084667dd9c0efa59/tools/cartopy_feature_download.py#L115-L117

It's the user data dir, not the user cache dir. And even then, it's only the user data dir according to XDG, not (necessarily) according to the OS. That is ok, but an alternative could be to use platformdirs.

In any case, as the OP suggests, it could be useful to document somewhere what cartopy.config['data_dir'] defaults to, so people don't have to look through the code to find out.

zmoon avatar Jun 15 '22 19:06 zmoon

I have a general question (hopefully related to this thread). Is it possible to use the conda-forge offline repository ( https://anaconda.org/conda-forge/cartopy_offlinedata ) to provide the offline data? If so, how can I load it later? I'm a HPC user that is building a docker container to be used as an Apptainer source which then will be run in a cluster with no internet access...

ricardobarroslourenco avatar Mar 21 '23 22:03 ricardobarroslourenco