netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

ZARR implementation status

Open durack1 opened this issue 4 years ago • 15 comments

Hi folks, I read https://www.unidata.ucar.edu/blogs/news/entry/netcdf-and-native-cloud-storage with interest and am curious about how this project is progressing.

Would it be possible to create a new project for this task, so that it's more transparent as to how work is progressing (and where help is needed)?

durack1 avatar Mar 23 '20 19:03 durack1

Having a separate project might be problematic, as it is part of the C library and not a stand-alone project/executable/library. I'll create a tag, however, that we can use to filter on as a stop-gap; I'm open to being pursuaded with counter arguments or other solutions we could adopt. In terms of the progress and such, I'll tag @DennisHeimbigner in explicitly.

WardF avatar Mar 23 '20 20:03 WardF

I am close to releasing an experimental version so I am not sure that a separate project is warranted. The "caveats" are as follows:

  • Everything needs more testing.
  • Performance is not great because I need to add optimizations of the chunking support. Currently it uses the most general algorithm and does not provide for special cases that make significant speed improvements.
  • We do not yet have a usable S3 storage driver - we have storage drivers that can use a local file system. I believe this storage format is compatible with the corresponding driver in the python version of zarr. I am working to support S3 storage, but that is hampered by lack of access to S3.
  • Compression is not supported until we can come up with a standard for dynamic loading of compressors that is acceptable to the existing zarr implementations.
  • The code is pretty "dirty". It was created by cloning and modifying the existing netcdf-c/libhdf5 code, so it has a lot of extraneous disabled comments and code to remind me of what needs to be added.
  • The code is a bit unstable because I am continually refactoring it.

DennisHeimbigner avatar Mar 23 '20 21:03 DennisHeimbigner

@WardF I think @durack1 is referring to a GitHub "project management" project that lists all the relevant issues and shows their status.

dopplershift avatar Mar 23 '20 22:03 dopplershift

I think by project, @durack1 means a github project for tracking issues/milestones (much like the "Thread Safety" project at https://github.com/Unidata/netcdf-c/projects/6), and not a separate github repository. As testing and feedback are provided, tracking those new issues as a project might result in having extra fingers on keyboards to help tackle the issues, as well as provide a roadmap on where things stand for those actively watching.

lesserwhirls avatar Mar 23 '20 22:03 lesserwhirls

Thank you both @dopplershift and @lesserwhirls I see that now. We can probably do that, I will discuss and coordinate with @DennisHeimbigner tomorrow.

WardF avatar Mar 23 '20 22:03 WardF

Sure we can do that. My above message should perhaps be a first entry for that project.

DennisHeimbigner avatar Mar 23 '20 22:03 DennisHeimbigner

@dopplershift @lesserwhirls thanks for expanding on my comment and clarifying, it is exactly what I meant. I understand the code will be adapted, but the discussion in https://github.com/Unidata/netcdf-c/issues/1677#issuecomment-602886363 was exactly my motivation. You will have many eyes starting to watch this as it evolves

durack1 avatar Mar 24 '20 01:03 durack1

I now have a S3 storage driver that compile and builds. Adding a c++ file to the build is a real poison pill because it requires that the c++ standard libraries be loaded (-lstdc++).

DennisHeimbigner avatar Mar 27 '20 02:03 DennisHeimbigner

Have you looked at the AWS SDK for C? I stumbled across the aws-c-common just yesterday. Looks like there are other components (e.g., aws-c-io and aws-c-compression) but I haven't found a full list and not sure of the status of the overall project.

ethanrd avatar Mar 27 '20 03:03 ethanrd

Yes, it is because I am using th aws s3 sdk (which is c++ onky) that I am having the poison pill problem.

DennisHeimbigner avatar Mar 27 '20 03:03 DennisHeimbigner

Is there actually a problem with defining CC=g++ ? What about manually linking in libstdc++?

dopplershift avatar Mar 27 '20 05:03 dopplershift

Looking at the project, Ethan listed (that appear to be C not C++), could that work for us? At a glance it appears to be active and cross platform. I’m on mobile at the moment but will look closer.

WardF avatar Mar 27 '20 22:03 WardF

I am looking at it now. One question is: do we trust this organization to be around for a while?

DennisHeimbigner avatar Mar 27 '20 23:03 DennisHeimbigner

I do not see any evidence that this supports S3. Am I missing something?

DennisHeimbigner avatar Mar 28 '20 00:03 DennisHeimbigner

I didn't see S3 support either but wasn't sure if I had found all the aws-c-* projects.

Also, given there appears to be at least some level of Amazon connection (the AWSLabs GitHub organization is "verified to control the amazon.com domain"), it seemed like it might be worth some digging.

ethanrd avatar Mar 30 '20 23:03 ethanrd