stars icon indicating copy to clipboard operation
stars copied to clipboard

Reading local Zarr files into stars

Open oshuwilson opened this issue 1 year ago • 14 comments

Hi,

After looking at the vignette for reading Zarr files in stars, I am unsure how to read local Zarr directories into R. I have been trying to work with satellite imagery for the Southern Ocean downloaded from Copernicus' Marine Data Client.

Here is my attempt at coding this

`library(stars)

dsn <- 'ZARR:"sic_daily_samples.zarr/"'

read_mdim(dsn)`

Which gives the error message

Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL' In addition: Warning messages: 1: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 2: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 3: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 4: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled

I've uploaded a subset of the data for ease but I can't figure out how to read it as a zipped or unzipped file, so any help with this would be appreciated!

Thanks, Josh

sic_daily_samples.zarr.zip

oshuwilson avatar Jan 30 '24 11:01 oshuwilson

I get

> read_mdim("sic_daily_sample.zarr/")
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
           Min. 1st Qu. Median Mean 3rd Qu. Max.  NA's
siconc [1]   NA      NA     NA  NaN      NA   NA 1e+05
dimension(s):
          from   to  refsys point
longitude    1 4320  WGS 84    NA
latitude     1  961  WGS 84    NA
time         1    1 POSIXct  TRUE
                                                      values x/y
longitude       [-180.0417,-179.9583),...,[179.875,179.9583) [x]
latitude  [-80.04167,-79.95833),...,[-0.04166667,0.04166667) [y]
time                                          2021-01-09 UTC    

What is your sessionInfo() and sf_extSoftVersion() output, after loading stars?

edzer avatar Jan 30 '24 13:01 edzer

Thanks Edzer, I tried the same code and got the same error message.

My sessionInfo() gives

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.6-4 sf_1.0-14   abind_1.4-5

loaded via a namespace (and not attached):
 [1] utf8_1.2.4         R6_2.5.1           tidyselect_1.2.0   e1071_1.7-13       magrittr_2.0.3    
 [6] glue_1.6.2         tibble_3.2.1       KernSmooth_2.23-22 parallel_4.3.2     pkgconfig_2.0.3   
[11] generics_0.1.3     dplyr_1.1.3        lifecycle_1.0.4    classInt_0.4-10    cli_3.6.1         
[16] fansi_1.0.5        vctrs_0.6.4        grid_4.3.2         DBI_1.2.1          proxy_0.4-27      
[21] class_7.3-22       compiler_4.3.2     rstudioapi_0.15.0  tools_4.3.2        pillar_1.9.0      
[26] Rcpp_1.0.11        rlang_1.1.2        units_0.8-4       

And my sf_extSoftVersion() prints

   GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.11.2"        "3.7.2"        "9.3.0"         "true"         "true"        "9.3.0" 

oshuwilson avatar Jan 30 '24 13:01 oshuwilson

Please update sf to 1.0-15, and try again.

edzer avatar Jan 30 '24 13:01 edzer

That still printed the same error message as previously. I haven't yet downloaded the latest version of RStudio but I don't imagine that would cause this error?

oshuwilson avatar Jan 30 '24 13:01 oshuwilson

See also https://github.com/r-spatial/stars/issues/566#issuecomment-1261880743

edzer avatar Jan 30 '24 13:01 edzer

Apologies, I'm not yet proficient with R. How do I install that patch? I tried using remotes::install_github("rspatial/sf") but I'm still seeing the same error code.

oshuwilson avatar Jan 30 '24 14:01 oshuwilson

No need for you to install that patch.

edzer avatar Jan 30 '24 14:01 edzer

Sorry I'm a bit lost as to what steps I can take from the other issue to fix my issue.

oshuwilson avatar Jan 30 '24 14:01 oshuwilson

I'm just cross linking them; I can reproduce the error on GitHub actions here: https://github.com/r-spatial/stars/actions/runs/7712573313/job/21020420577#step:6:297

edzer avatar Jan 30 '24 14:01 edzer

@oshuwilson,

It seems that this issue is specific to the Windows binary release. Note that you can use CopernicusMarine for subsetting Copernicus Marine data as well. However, it does not yet support ZARR data because of the issue reported here and https://github.com/r-spatial/stars/issues/566#issuecomment-1261880743

pepijn-devries avatar Jan 30 '24 15:01 pepijn-devries

Thanks @pepijn-devries - I'll look at doing that to download as a netCDF if the Zarr format remains unusable for my setup. My main issue is that the full data I need is massive (~1.3TB as a netCDF but only ~250GB as Zarr), so Zarr would be preferable if it can work! But if not, I'll get a new hard drive and put my computer to the test.

oshuwilson avatar Jan 30 '24 15:01 oshuwilson

It seems that this issue is specific to the Windows binary release.

Windows and MacOS binary releases; we added blosc, at least to windows binary builds, but this suggests it's not working.

edzer avatar Jan 30 '24 16:01 edzer

Hi @edzer,

Is there any news on the Windows build and blosc decompression of ZARR files? Thanks for your work on the package!

By the way, I did some additional testing. The issue does not only occur on Windows, but also on a Linux Fedora (virtual) machine I have set up:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.12.1, GDAL 3.7.3, PROJ 9.2.1; sf_use_s2() is TRUE
dsn <- 'ZARR:"/vsicurl/https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr"'
bounds <- c(longitude = "lon_bounds", latitude = "lat_bounds")
r <- read_mdim(dsn, bounds = bounds)
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

Created on 2024-03-11 with reprex v2.1.0

With sessionInfo():

R version 4.3.2 (2023-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 39 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=nl_NL.UTF-8       LC_NUMERIC=C               LC_TIME=nl_NL.UTF-8        LC_COLLATE=nl_NL.UTF-8    
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=nl_NL.UTF-8    LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] gtable_0.3.4       dplyr_1.1.4        compiler_4.3.2     tidyselect_1.2.0   reprex_2.1.0       Rcpp_1.0.12       
 [7] clipr_0.8.0        callr_3.7.5        scales_1.3.0       yaml_2.3.8         fastmap_1.1.1      ggplot2_3.5.0     
[13] R6_2.5.1           generics_0.1.3     classInt_0.4-10    sf_1.0-15          knitr_1.45         tibble_3.2.1      
[19] units_0.8-5        munsell_0.5.0      DBI_1.2.2          pillar_1.9.0       rlang_1.1.3        utf8_1.2.4        
[25] xfun_0.42          fs_1.6.3           cli_3.6.2          withr_3.0.0        magrittr_2.0.3     ps_1.7.6          
[31] class_7.3-22       processx_3.8.3     digest_0.6.34      grid_4.3.2         rstudioapi_0.15.0  lifecycle_1.0.4   
[37] vctrs_0.6.5        KernSmooth_2.23-22 proxy_0.4-27       evaluate_0.23      glue_1.7.0         fansi_1.0.6       
[43] e1071_1.7-14       colorspace_2.1-0   rmarkdown_2.26     tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.7   

pepijn-devries avatar Mar 11 '24 08:03 pepijn-devries

Same here, using MacOS.

library(stars)
> dsn = 'ZARR:"/vsicurl/https://storage.googleapis.com/cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/highresSST-present/r1i1p1f1/6hrPlev/psl/gn/v20170706"/'
> gdal_utils("info", dsn)
Warning messages:
1: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
2: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
3: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
4: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
5: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
6: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
7: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled

With sessionInfo():

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-16

loaded via a namespace (and not attached):
 [1] compiler_4.3.1     magrittr_2.0.3     class_7.3-22       DBI_1.2.3          tools_4.3.1        units_0.8-5        proxy_0.4-27       rstudioapi_0.16.0  Rcpp_1.0.13        KernSmooth_2.23-24 grid_4.3.1         e1071_1.7-14       classInt_0.4-10 

Artur-man avatar Aug 01 '24 20:08 Artur-man