Issue with some zipped vector data
What happened?
We are having issues opening some zipped vector data.
In the snippet below, earthkit.data works fine with version="rgi_6_0", but it raises an error with version="rgi_7_0".
What are the steps to reproduce the bug?
import earthkit.data
dataset = "insitu-glaciers-extent"
request = {
"variable": "glacier_area",
"product_type": "vector",
}
for version in ("rgi_6_0", "rgi_7_0"):
ds = earthkit.data.from_source("cds", dataset, request | {"version": version})
try:
df = ds.to_pandas()
except Exception as exc:
print(f"{version = }: {exc!s}")
raise
else:
print(f"{version = }: OK!")
Version
0.12.1
Platform (OS and architecture)
Darwin MacBook-Pro-di-Bopen.local 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:24 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6030 arm64
Relevant log output
Unknown file type, no reader available. path=/var/folders/z4/9f32__x92kl340wxp0m4hfym0000gp/T/tmp4_wsms_8/cds-023bde4eed526ccb72379966cbf08d5cfebea278f82f2143efc49ec801b34338.d/rgi2000_v70_vector.shp magic=b"\x00\x00'\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00$\xceo\xa4\xe8\x03\x00\x00\x0f\x00\x00\x00#h\xcc$\xea}f\xc0;\x1c]\xa5\xbb\x93S\xc0\x96A\xb5\xc1\txf@\x96\xcc\xb1\xbc" content_type=None
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[1], line 12
10 ds = earthkit.data.from_source("cds", dataset, request | {"version": version})
11 try:
---> 12 df = ds.to_pandas()
13 except Exception as exc:
14 print(f"{version = }: {exc!s}")
File ~/miniforge3/envs/earthkit-data/lib/python3.11/site-packages/earthkit/data/core/__init__.py:50, in Base.to_pandas(self, **kwargs)
47 @abstractmethod
48 def to_pandas(self, **kwargs):
49 """Converts into a pandas dataframe"""
---> 50 self._not_implemented()
File ~/miniforge3/envs/earthkit-data/lib/python3.11/site-packages/earthkit/data/core/__init__.py:155, in Base._not_implemented(self)
153 if hasattr(self, "path"):
154 extra = f" on {self.path}"
--> 155 raise NotImplementedError(f"{module}.{name}.{func}(){extra}")
NotImplementedError: earthkit.data.sources.empty.EmptySource.to_pandas()
Accompanying data
No response
Organisation
B-Open/EQC
@malmans2, thank you for reporting this issue. When I try to run it with:
earthkit-data develop cdsapi 0.7.4
I get the following error for both versions:
HTTPError: 400 Client Error: Bad Request for url: https://cds.climate.copernicus.eu/api/retrieve/v1/processes/insitu-glaciers-extent/execution
invalid request
Request has not produced a valid combination of values, please check your selection.
{'variable': 'glacier_area', 'product_type': 'vector', 'version': 'r'}
Are you sure you are running the exact same snippet I sent you?
Looks like your version is incorrect: 'version': 'r'. It should be either "rgi_6_0" or "rgi_7_0".
Maybe you have a bug in your code and you are iterating over a string rather than an iterable of strings?
I am sorry, you were right I used a wrong request. Now I am able to reproduce the error.
@malmans2, earthkit-data cannot read the retrieved shapefile for the "rgi_7_0" version because it cannot identify it as a valid shape file. A shapefile consists of multiple files, and in earthkit-data 3 of these are expected to be present with the following suffixes:
MANDATORY = (".shp", ".shx", ".dbf")
If we look at the content of the downloaded data after extracted into a directory we see this:
260805380 10 Feb 10:52 rgi2000 v70_vector.dbf
145 10 Feb 10:52 rgi2000_v70_vector.prj
1235017544 10 Feb 10:52 rgi2000_v70_vector.shp
2196348 10 Feb 10:52 rgi2000_v70_vector.shx
Here the filename rgi2000 v70_vector.dbf is incorrect because it contains a whitespace instead of an underscore, "2000 v70" instead of "2000_v70"
I can see 2 possible solutions to this:
- the filename is fixed on the CDS side
- earthkit-data should relax its checks and should allow small differences in the shapefile filenames
Got it, thanks for the details. I will inform the EQC evaluator and the CDS technical officer.
@malmans2, I presume this issue can be closed. Please reopen it if there is anything to do on the earthkit side.