netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

OSError: [Errno -128] NetCDF: Attempt to use feature that was not turned on when netCDF was built.

Open cryptoboxcomics opened this issue 2 years ago • 30 comments

Attempting to open zarr file in an S3 bucket using the following python code:

from netCDF4 import Dataset
nc2 = Dataset("s3://mybucket/zarr_key/#mode=zarr,s3", "r")

When I attempt to open this, I get the following traceback:

Traceback (most recent call last):
  File "printnetcdf.py", line 8, in <module>
    nc2 = Dataset("s3://mybucket/zarr_key/#mode=zarr,s3", "r")
  File "src/netCDF4/_netCDF4.pyx", line 2353, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 1963, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -128] NetCDF: Attempt to use feature that was not turned on when netCDF was built.: b's3://mybucket/zarr_key/#mode=zarr,s3'

cryptoboxcomics avatar Jul 25 '22 20:07 cryptoboxcomics

This error is odd:

OSError: [Errno -128] NetCDF: Attempt to use feature that was not turned on when netCDF was built.: b's3://mybucket/zarr_key/#mode=zarr,s3'

It indicates that netcdf-c library was not built with (nc)zarr support enabled. Can you check the library installation, perhaps looking at libnetcdf.settings file?

DennisHeimbigner avatar Jul 25 '22 20:07 DennisHeimbigner

@cryptoboxcomics How did you install netcdf4-python? Conda? Pip?

dopplershift avatar Jul 25 '22 23:07 dopplershift

netcdf4 was installed using pip3. I was able to run this just fine with a local file store like "file:///directory/to/my/zarr/#mode=nczarr,zarr,file", but it seems like I'm only getting this issue when trying to access S3.

I'm not sure where I can get access to the libnetcdf.settings, but the command pip3 show netcdf4 displays the following:

Name: netCDF4 Version: 1.6.0 Summary: Provides an object-oriented python interface to the netCDF version 4 library. Home-page: http://github.com/Unidata/netcdf4-python Author: Jeff Whitaker Author-email: [email protected] License: License :: OSI Approved :: MIT License Location: /usr/local/lib64/python3.7/site-packages Requires: cftime, numpy Required-by:

cryptoboxcomics avatar Jul 26 '22 15:07 cryptoboxcomics

What OS are you using? Do you know if when you installed it built from source or installed using a pre-built wheel (which would be my guess)?

dopplershift avatar Jul 26 '22 19:07 dopplershift

I am not explicitly enabling nczarr-s3 support in the netcdf-c build that is used in the wheels. Should I?

jswhit avatar Jul 26 '22 19:07 jswhit

@jswhit That's where I was going to go. It would be nice to have that support baked-in, though that will require having the aws C SDK available IIUC. @DennisHeimbigner ?

dopplershift avatar Jul 26 '22 19:07 dopplershift

Turning S3 support by default, as you note, requires having the AWS C++ SDK installed. To date, I have only been able to get that to work using Ubuntu Linux. AWS SDK is way overkill for what we need, so I keep looking for a streamlined alternative to the AWS SDK, but so far, that does not appear to exist.

DennisHeimbigner avatar Jul 26 '22 20:07 DennisHeimbigner

@DennisHeimbigner Had you seen this? https://github.com/awslabs/aws-c-s3

It's rough, but maybe vendoring it would be the lesser of two evils (the other "evil" being essentially useless S3 support)?

dopplershift avatar Jul 26 '22 20:07 dopplershift

I looked at it some time ago. At that point, it had one of the most complex builds I had ever seen because it was divided into a myriad of separate modules. I never even got it to build on Linux. But, it may be worth revisiting to see if it now more build-able.

DennisHeimbigner avatar Jul 26 '22 20:07 DennisHeimbigner

@dopplershift , this was built using a pre-built wheel. The OS that I'm using is Centos 7.

cryptoboxcomics avatar Jul 28 '22 15:07 cryptoboxcomics

Sounds like @jswhit 's comment about not having enabled nczarr-s3 support when building the wheels is the cause here.

dopplershift avatar Jul 28 '22 17:07 dopplershift

Hi all, I'm doing the same thing : I tried to read netcdf and nczarr files from s3 local server (ninja s3 server for instance). I compiled aws S3 SDK, netcdf-c and netcdf4-python I enable nczarr-s3 support

But this command falls into an infinite loop :

s3_url_dataset_nc="s3://localhost:9444/data/my_netcdf_file.nczarr"
dataset = netCDF4.Dataset(self.s3_url_dataset_nc + "#mode=s3,nczarr")

Any idea on what I'm doing wrong ? Thanks !

CedricPenard avatar Nov 29 '23 11:11 CedricPenard

Out of curiosity @CedricPenard, what platform are you on, and which version of netCDF-C? We're working on getting v4.9.3 out, which improves s3 support. There are some tricky issues (not restricted to netCDF) when working with the Amazon S3 SDK, depending on the platform.

WardF avatar Nov 29 '23 17:11 WardF

As a workaround, try changing

"s3://mybucket/zarr_key/#mode=zarr,s3"
to
"https://s3.amazon.com/mybucket/zarr_key/#mode=zarr,s3"

DennisHeimbigner avatar Nov 29 '23 18:11 DennisHeimbigner

Out of curiosity @CedricPenard, what platform are you on, and which version of netCDF-C? We're working on getting v4.9.3 out, which improves s3 support. There are some tricky issues (not restricted to netCDF) when working with the Amazon S3 SDK, depending on the platform.

I am on Ubuntu 22.04 netCDF-c version : 4.9.3-development netcdf4-python version : 1.7.0

Finally it's not an infinite loop : a long time later I get this error : OSError: [Errno -138] NetCDF: S3 error: 's3://localhost:9444/data/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3'

Same thing with https instead of s3 : OSError: [Errno -138] NetCDF: S3 error: 'https://localhost:9444/data/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3'

Is a non amazon server supported ? I work with a local Ninja S3 server.

CedricPenard avatar Nov 30 '23 08:11 CedricPenard

We would like to support local servers, but have no way to test it. Try this experiment. Execute this command and post the output.

ncdump -h '[log][show=fetch]https://localhost:9444/data/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3'

DennisHeimbigner avatar Nov 30 '23 22:11 DennisHeimbigner

Also, the issue may be that the aws-sdk-cpp library does not support non-amazon servers. Starting with netcdf-c-4.9.3, we have an alternate library that may work (or can be made to work) with non-amazon servers.

DennisHeimbigner avatar Nov 30 '23 23:11 DennisHeimbigner

We would like to support local servers, but have no way to test it. Try this experiment. Execute this command and post the output.

ncdump -h '[log][show=fetch]https://localhost:9444/data/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3'

Yes, seems non amazon s3 server is not supported by aws-sdk

ncdump -h '[log][show=fetch]https://localhost:9444/data/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3'
ERR:  curlCode: 28, Timeout was reached key=
ncdump: [log][show=fetch]https://localhost:9444/data/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3: NetCDF: S3 error

CedricPenard avatar Dec 01 '23 15:12 CedricPenard

Interesting, and good to note. If you were to check out the main branch from Github and compile with the --enable-s3-internal flag, you would be able to test the same URL to see if it works with the integrated S3-SDK alternative.

WardF avatar Dec 01 '23 16:12 WardF

The compilation with -DENABLE_S3_INTERNAL is ok (I use cmake) I don't have the same error (and it's immediate) :

ncdump -h '[log][show=fetch]https://localhost:9444/nczarrdata/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3'
ncdump: [log][show=fetch]https://localhost:9444/nczarrdata/SWOT_L2_LR_PreCalSSH_Expert_002_086_20230814T031152_20230814T040129_PIA1_01.nczarr#mode=nczarr,s3: NetCDF: Authorization failure

It's strange the bucket is public.

CedricPenard avatar Dec 01 '23 16:12 CedricPenard

@CedricPenard Is localhost:9444 actually https protected? The ninja docs would seem to indicate that it is not, by default.

akrherz avatar Dec 01 '23 16:12 akrherz

No I have let the default parameter.

What is the command line to give the key and secret ?

CedricPenard avatar Dec 01 '23 16:12 CedricPenard

Hi I made some test with another s3 server.

Seems the url of server is not taken into account.

ncdump -h '[log][show=fetch]https://s3.datalake.cnes.fr/campus-rt-netcdfstreaming/SWOT_L2_HR_PIXC_509_011_242R_20230503T014506_20230503T014517_PIA1_01.nczarr#mode=s3'

NOTE: fetch: https://storage.googleapis.com/campus-rt-netcdfstreaming/SWOT_L2_HR_PIXC_509_011_242R_20230503T014506_20230503T014517_PIA1_01.nczarr.dds
syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <?xml^ version='1.0' encoding='UTF-8'?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist.</Message></Error>
NOTE: fetch complete: 0.537 secs
ncdump: [log][show=fetch]https://s3.datalake.cnes.fr/campus-rt-netcdfstreaming/SWOT_L2_HR_PIXC_509_011_242R_20230503T014506_20230503T014517_PIA1_01.nczarr#mode=s3: NetCDF: file not found

CedricPenard avatar Dec 11 '23 14:12 CedricPenard

Found the problem in the enum of NCS3SVC :

git diff include/ncs3sdk.h
diff --git a/include/ncs3sdk.h b/include/ncs3sdk.h
index 771faa66..b1dbf506 100644
--- a/include/ncs3sdk.h
+++ b/include/ncs3sdk.h
@@ -9,7 +9,7 @@
 /* Track the server type, if known */
 typedef enum NCS3SVC {NCS3UNK=0, /* unknown */
                  NCS3=1,    /* s3.amazon.aws */
-                NCS3GS=0   /* storage.googleapis.com */
+                NCS3GS=2   /* storage.googleapis.com */
 } NCS3SVC;

Now I have this error :

ncdump -h '[log][show=fetch]https://s3.datalake.cnes.fr/campus-rt-netcdfstreaming/SWOT_L2_HR_PIXC_509_011_242R_20230503T014506_20230503T014517_PIA1_01.nczarr#mode=s3'
>>> NC_s3urlrebuild: final=[log][show=fetch]https://s3.datalake.cnes.fr/campus-rt-netcdfstreaming/SWOT_L2_HR_PIXC_509_011_242R_20230503T014506_20230503T014517_PIA1_01.nczarr#mode=s3 bucket=campus-rt-netcdfstreaming region=us-east-1
NOTE: fetch: https://s3.datalake.cnes.fr/campus-rt-netcdfstreaming/SWOT_L2_HR_PIXC_509_011_242R_20230503T014506_20230503T014517_PIA1_01.nczarr.dds
ERR: curl error: SSL peer certificate or SSH remote key was not OK
curl error details: 
WARN: oc_open: Could not read url
NOTE: fetch complete: 0.025 secs
ncdump: [log][show=fetch]https://s3.datalake.cnes.fr/campus-rt-netcdfstreaming/SWOT_L2_HR_PIXC_509_011_242R_20230503T014506_20230503T014517_PIA1_01.nczarr#mode=s3: NetCDF: I/O failure

CedricPenard avatar Dec 11 '23 14:12 CedricPenard

Part of the problem is that the URL you using is being treated as if it is a DAP2 URL. Try changing the "#mode=s3" at the end to "mode=zarr,s3" and see if gets any further along.

DennisHeimbigner avatar Dec 11 '23 16:12 DennisHeimbigner

With #mode=zarr,s3 I have a "S3 error" without any other information.

Edit: Ok it's a problem with curl request and authentication. It seems that curl doesn't take into account ~/.aws/credentials I will try to see why.

CedricPenard avatar Dec 12 '23 12:12 CedricPenard

When I put "https://s3.datalake.cnes.fr/" into my browser, it says the site does not exist.

DennisHeimbigner avatar Dec 12 '23 20:12 DennisHeimbigner

Yes it's only accessible locally.

CedricPenard avatar Dec 13 '23 07:12 CedricPenard

Hello,

NCH5_s3comms_load_aws_profile is not called. How and where credentials are managed ?

CedricPenard avatar Dec 15 '23 14:12 CedricPenard

See the following functions:

  1. libdispatch/ds3util.c#NC_s3sdkinitialize()
  2. libdispatch/ds3util.c#NC_aws_load_credentialss()
  3. libdispatch/ds3util.c#NC_getactives3profile()
  4. libdispatch/ds3util.c#NC_authgets3profile()

If memory serves, (1) is called at initialization to load various environment variables, some of which affect the loading of profile information. Function (2) is called at initialization to load from .aws/config+.aws/credentials as controlled by the info loaded by (1). Function (3) is called at various points to get the current "active" profile as determined by URL, or various environment variables, or by the info read in (1) and (2). It uses (3) to search the list of loaded profiles.

DennisHeimbigner avatar Dec 16 '23 00:12 DennisHeimbigner