terra icon indicating copy to clipboard operation
terra copied to clipboard

Issues with reading HDF5 files using `terra::rast()`

Open dimfalk opened this issue 2 years ago • 19 comments

Hello! I'm facing some problems with reading HDF5 files using terra at the moment but, to be honest, I have to mention I'm not completely sure whether this is the right place to ask for help - ergo, if this is really terra-related - or just a result of my lack of understanding of the HDF5 format or terra functionality respectively.

However, I'll give it a try and if this is not the right place, please tell me.

I'm working with precipitation raster data stored in HDF5 containers (data sample: hd2011010000.zip).

terra::rast("hd2011010000.scu", subds = "/dataset_DXk/image")
#> Warning: [rast] unknown extent
#> class       : SpatRaster 
#> dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#> resolution  : 0.001865672, 0.001865672  (x, y)
#> extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : hd2011010000.scu 
#> name        : hd2011010000

Created on 2022-06-23 by the reprex package (v2.0.1)

Reading seems fine at first sight until I attempt to plot() the resulting SpatRaster object or perform some basic calculations (like a division by a certain factor to adjust units):

HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
 #000: ../../src/H5Dio.c line 192 in H5Dread(): can't read data
   major: Dataset
   minor: Read failed
 #001: ../../src/H5VLcallback.c line 2080 in H5VL_dataset_read(): dataset read failed
   major: Virtual Object Layer
   minor: Read failed
 #002: ../../src/H5VLcallback.c line 2046 in H5VL__dataset_read(): dataset read failed
   major: Virtual Object Layer
   minor: Read failed
 #003: ../../src/H5VLnative_dataset.c line 167 in H5VL__native_dataset_read(): can't read data
   major: Dataset
   minor: Read failed
 #004: ../../src/H5Dio.c line 567 in H5D__read(): can't read data
   major: Dataset
   minor: Read failed
 #005: ../../src/H5Dchunk.c line 2594 in H5D__chunk_read(): unable to read raw data chunk
   major: Low-level I/O
   minor: Read failed
 #006: ../../src/H5Dchunk.c line 3957 in H5D__chunk_lock(): data pipeline read failed
   major: Dataset
   minor: Filter operation failed
 #007: ../../src/H5Z.c line 1311 in H5Z_pipeline(): required filter 'szip' is not registered
   major: Data filters
   minor: Read failed
 #008: ../../src/H5PLint.c line 274 in H5PL_load(): search in path table failed
   major: Plugin for dynamically loaded library
   minor: Can't get value
 #009: ../../src/H5PLpath.c line 604 in H5PL__find_plugin_in_path_table(): search in path C:\ProgramData\hdf5\lib\plugin encountered an error
   major: Plugin for dynamically loaded library
   minor: Can't get value
 #010: ../../src/H5PLpath.c line 734 in H5PL__find_plugin_in_path(): can't open directory
   major: Plugin for dynamically loaded library
   minor: Can't open directory or file
Error: [readValues] cannot read values
In addition: Warning messages:
1: [rast] unknown extent

2: H5Dread() failed for block. (GDAL error 1) 
3: HDF5:"hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X offset 0, Y offset 0: H5Dread() failed for block. (GDAL error 1) 

Although *.scu does not seem to be a common file extension for the HDF5 format, hdf5-tools seems to be able to read this, c.f. output from h5debug and h5dump.

Moreover, the file is recognized using HDFView 3.14 in terms of the content being plotable. Making use of Panoply 5.1.0 the image layer is not recognized but this could also be a result of the need to change the file extension to *.hdf5 in order to be recognized by the software, no idea tbh. Maybe the structure of the container is not quite up to standards so that reading is failing with some products?

Before trying to migrate from raster to terra I was making use of rhdf5 to read these files but I'd like to reduce further dependencies if possible, hence my attempt to solve this using terra only.

rhdf5::h5read("hd2011010000.scu", "/dataset_DXk/image") |> t() |> raster::raster()
#> class      : RasterLayer 
#> dimensions : 536, 536, 287296  (nrow, ncol, ncell)
#> resolution : 0.001865672, 0.001865672  (x, y)
#> extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> crs        : NA 
#> source     : memory
#> names      : layer 
#> values     : -999, 15  (min, max)

Created on 2022-06-23 by the reprex package (v2.0.1)

I'd really like to understand why reading fails here. Is there something I can do to be able to read these files using terra?

Thank you very much in advance!

dimfalk avatar Jun 23 '22 15:06 dimfalk

Below is what GDAL sees (with "GDALinfo"). It suggests that this file is not organized as a typical geospatial raster file.

describe("hd2011010000.scu")
# [1] "Driver: HDF5Image/HDF5 Dataset"                        
# [2] "Files: hd2011010000.scu"                               
# [3] "Size is 536, 536"                                      
# [4] "Metadata:"                                             
# [5] "  dataset_DXk_what_elev_[deg]=0.5 "                    
# [6] "  dataset_DXk_what_enddate_[YYYYMMDD]=20201101"        
# [7] "  dataset_DXk_what_endtime_[HHmmss]=000000"            
# [8] "  dataset_DXk_what_gain=1 "                            
# [9] "  dataset_DXk_what_nodata=-999 "                       
#[10] "  dataset_DXk_what_ori_format=DX"                      
#[11] "  dataset_DXk_what_ori_name=HD2011010000.scu"          
#[12] "  dataset_DXk_what_product=COMP  "                     
#[13] "  dataset_DXk_what_quantity=RATE "                     
#[14] "  dataset_DXk_what_rad=1 "                             
#[15] "  dataset_DXk_where_xsize=536 "                        
#[16] "  dataset_DXk_where_ysize=536 "                        
#[17] "  how_a1gate=1512001264 "                              
#[18] "  how_adjustment=1 "                                   
#[19] "  how_angles_[deg]=0.5 "                               
#[20] "  how_doppler=4 "                                      
#[21] "  how_maxlevel_[dBZ]=65 "                              
#[22] "  how_minlevel_[dBZ]=0 "                               
#[23] "  how_nodes=ESS,NHB,FLD,"                              
#[24] "  how_nodesn=3 "                                       
#[25] "  how_place=fld"                                       
#[26] "  how_resolution_[dBZ]=0.5 "                           
#[27] "  how_wavelength_[cm]=5 "                              
#[28] "  how_WMO=121 "                                        
#[29] "  how_zr_a=256 "                                       
#[30] "  how_zr_b=1.42 "                                      
#[31] "  what_date_[YYYYMMDD]=20201101\\"                     
#[32] "  what_object=IMAGE"                                   
#[33] "  what_sets=1 "                                        
#[34] "  what_time_[HHmmss]=000000\\"                         
#[35] "  what_version_[H5Vol_0.2]=SCP_BIG 5.0.11.24   "       
#[36] "  where_poserror="                                     
#[37] "  where_projdef=UTM_N32"                               
#[38] "  where_range=128 "                                    
#[39] "  where_start_lat_[deg]=5943 "                         
#[40] "  where_start_lon_[deg]=195 "                          
#[41] "  where_UL_ipixel=195 "                                
#[42] "  where_UL_jpixel=5943 "                               
#[43] "  where_xscale_[m]=1000 "                              
#[44] "  where_xsize=536 "                                    
#[45] "  where_yscale_[m]=1000 "                              
#[46] "  where_ysize=536 "                                    
#[47] "Corner Coordinates:"                                   
#[48] "Upper Left  (    0.0,    0.0)"                         
#[49] "Lower Left  (    0.0,  536.0)"                         
#[50] "Upper Right (  536.0,    0.0)"                         
#[51] "Lower Right (  536.0,  536.0)"                         
#[52] "Center      (  268.0,  268.0)"                         
#[53] "Band 1 Block=64x72 Type=Float32, ColorInterp=Undefined"
#[54] "  Metadata:"                                           
#[55] "    dataset_DXk_image_CLASS=IMAGE"                     
#[56] "    dataset_DXk_image_IMAGE_BACKGROUNDINDEX=0 "        
#[57] "    dataset_DXk_image_IMAGE_COLORMODEL=RGB"            
#[58] "    dataset_DXk_image_IMAGE_MINMAXRANGE=0 0 "          
#[59] "    dataset_DXk_image_IMAGE_SUBCLASS=BITMAP"           
#[60] "    dataset_DXk_image_IMAGE_VERSION=1.2"               
#[61] "    dataset_DXk_image_PALETTE="  

I suspected that the GDAL HDF5 driver made a wrong assumption, and thus failed when reading the file. However, the GDAL command line tool gdal_translate works, as this creates a valid GTiff

gdal_translate -of GTiff hd2011010000.scu test.tif

So that suggests that terra makes a wrong assumption. I need to investigate that.

rhijmans avatar Jun 23 '22 17:06 rhijmans

Thank your for your fast reply! I was able to reproduce this behaviour with GDAL 3.0.4 - thanks for the hint using gdalinfo and gdal_translate! Let me know if you need further information on the data.

By the way, stars::read_stars("hd2011010000.scu") fails with the same error message.

dimfalk avatar Jun 23 '22 21:06 dimfalk

By the way, stars::read_stars("hd2011010000.scu") fails with the same error message.

I see no errors, and

library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
(r = read_stars("hd2011010000.scu"))
# stars object with 2 dimensions and 1 attribute
# attribute(s):
#                   Min. 1st Qu. Median      Mean 3rd Qu. Max.
# hd2011010000.scu  -999    -999   -999 -587.3865       0   15
# dimension(s):
#   from  to offset delta refsys point values x/y
# x    1 536      0     1     NA    NA   NULL [x]
# y    1 536    536    -1     NA    NA   NULL [y]
r[r == -999] = NA
plot(r)

gives

x

The file seems to lack a missing value flag, and the array is not georeferenced, as you noticed.

edzer avatar Jun 27 '22 14:06 edzer

This is what I see with CRAN-stars:

library(stars)
#Loading required package: abind
#Loading required package: sf
#Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1; sf_use_s2() is TRUE
(r = read_stars("hd2011010000.scu"))
#Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  : 
#  read failure
#In addition: Warning messages:
#1: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  :
#  GDAL Error 1: H5Dread() failed for block.
#2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  :
#  GDAL Error 1: HDF5:"C:\Users\rhijm\Downloads\hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X #offset 0, Y offset 0: H5Dread() failed for block.

On windows with this sessionInfo()

R Under development (unstable) (2022-05-14 r82360 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.5-5 sf_1.0-7    abind_1.4-5

loaded via a namespace (and not attached):
 [1] crayon_1.5.1       vctrs_0.4.1        cli_3.3.0          rlang_1.0.2        DBI_1.1.2          KernSmooth_2.23-20 purrr_0.3.4       
 [8] generics_0.1.2     assertthat_0.2.1   glue_1.6.2         lwgeom_0.2-8       e1071_1.7-9        fansi_1.0.3        grid_4.3.0        
[15] classInt_0.4-3     tibble_3.1.7       ellipsis_0.3.2     lifecycle_1.0.1    compiler_4.3.0     dplyr_1.0.9        Rcpp_1.0.8.3      
[22] pkgconfig_2.0.3    R6_2.5.1           class_7.3-20       tidyselect_1.1.2   utf8_1.2.2         parallel_4.3.0     pillar_1.7.0      
[29] magrittr_2.0.3     tools_4.3.0        proxy_0.4-26       units_0.8-0       

rhijmans avatar Jun 27 '22 17:06 rhijmans

And with the development version as well. So perhaps this was fixed somewhere in GDAL > 3.3.2?

rhijmans avatar Jun 27 '22 17:06 rhijmans

It works for me on Ubuntu with GDAL 3.4.0:

library(stars)
#Loading required package: abind
#Loading required package: sf
#Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
(r = read_stars("hd2011010000.scu"))
#stars object with 2 dimensions and 1 attribute
#attribute(s):
#                  Min. 1st Qu. Median      Mean 3rd Qu. Max.
#hd2011010000.scu  -999    -999   -999 -587.3865       0   15
#dimension(s):
#  from  to offset delta refsys point values x/y
#x    1 536      0     1     NA    NA   NULL [x]
#y    1 536    536    -1     NA    NA   NULL [y]

r <- rast("hd2011010000.scu")
NAflag(r) <- -999
r * 1
#class       : SpatRaster
#dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#resolution  : 0.001865672, 0.001865672  (x, y)
#extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84
#source      : memory
#name        : hd2011010000
#min value   :            0
#max value   :           15

Although on another Ubuntu with GDAL 2.2.3 it works with stars but not with terra. So there may be more to it.

rhijmans avatar Jun 27 '22 17:06 rhijmans

It could also possibly be related to how R GDAL HDF5 is built on windows: https://github.com/OSGeo/gdal/issues/1428

rhijmans avatar Jun 27 '22 18:06 rhijmans

So terra puts a raster without georeference on the unit square, where GDAL (and stars) put them on the rectangle from (0,0) to (rcol,nrow). Is that on purpose?

edzer avatar Jun 27 '22 19:06 edzer

I was about to say it cannot be ruled out I messed up a bit in my last late-night-session considering stars, but seems like you're already some steps ahead of this.. 😏

Nevertheless, just to add this as a supplement:

library(stars)
#> Lade nötiges Paket: abind
#> Lade nötiges Paket: sf
#> Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1; sf_use_s2() is TRUE

(r <- read_stars("hd2011010000.scu"))
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: H5Dread() failed for block.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: HDF5:"hd2011010000.scu":
#> //dataset_DXk/image, band 1: IReadBlock failed at X offset 0, 
#> Y offset 0: H5Dread() failed for block.
#> Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : read failure
sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
#> [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base    
#> 
#> other attached packages:
#> [1] stars_0.5-5 sf_1.0-7    abind_1.4-5
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3       pillar_1.7.0       compiler_4.2.0     highr_0.9         
#>  [5] class_7.3-20       R.methodsS3_1.8.2  R.utils_2.11.0     tools_4.2.0       
#>  [9] digest_0.6.29      evaluate_0.15      lifecycle_1.0.1    tibble_3.1.7      
#> [13] R.cache_0.15.0     pkgconfig_2.0.3    rlang_1.0.2        reprex_2.0.1      
#> [17] cli_3.3.0          DBI_1.1.3          rstudioapi_0.13    parallel_4.2.0    
#> [21] yaml_2.3.5         xfun_0.31          fastmap_1.1.0      e1071_1.7-11      
#> [25] dplyr_1.0.9        withr_2.5.0        styler_1.7.0       stringr_1.4.0     
#> [29] knitr_1.39         generics_0.1.2     fs_1.5.2           vctrs_0.4.1       
#> [33] tidyselect_1.1.2   grid_4.2.0         classInt_0.4-7     glue_1.6.2        
#> [37] R6_2.5.1           fansi_1.0.3        rmarkdown_2.14     purrr_0.3.4       
#> [41] magrittr_2.0.3     units_0.8-0        ellipsis_0.3.2     htmltools_0.5.2   
#> [45] assertthat_0.2.1   KernSmooth_2.23-20 utf8_1.2.2         proxy_0.4-27      
#> [49] stringi_1.7.6      lwgeom_0.2-8       crayon_1.5.1       R.oo_1.25.0

dimfalk avatar Jun 27 '22 19:06 dimfalk

Yes, sorry, I forgot to say that I was looking at stars from github, remotes::install_github("r-spatial/stars")

edzer avatar Jun 27 '22 20:06 edzer

@edzer: terra assigns an extent of (0, ncol, 0, nrow) when creating a SpatRaster from a matrix (#1); but it is (0,1,0,1) when a file does not supply an extent. I cannot say that there is a strong reason for that.

To get the dev version of stars on windows I would instead do install.packages('terra', repos='https://rspatial.r-universe.dev') --- but either way, that won't change anything in this case.

I assume that this is a GDAL bug that will go away once R-tools gets to GDAL >= 3.4.0; so I am closing the issue.

rhijmans avatar Jun 27 '22 20:06 rhijmans

Ok, thank you very much for your efforts and all the explanations!

terra assigns an extent of (0, ncol, 0, nrow) when creating a SpatRaster from a matrix (https://github.com/rspatial/terra/issues/1); but it is (0,1,0,1) when a file does not supply an extent. I cannot say that there is a strong reason for that.

I tried providing crs and extent parameters before opening this issue - presumably I was hoping to be able to override the defaults of the container resp. supply missing values - but this did not really seem to change anything:

library(terra)
#> terra 1.5.34

terra::rast("hd2011010000.scu")
#> Warning: [rast] unknown extent
#> class       : SpatRaster 
#> dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#> resolution  : 0.001865672, 0.001865672  (x, y)
#> extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : hd2011010000.scu 
#> name        : hd2011010000

terra::rast("hd2011010000.scu", 
            crs = sf::st_crs(25832)$proj4string)
#> Error in .local(x, ...): unused argument (crs = "+proj=utm +zone=32 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")

terra::rast("hd2011010000.scu", 
            extent = c(195000, 731000, 5407000, 5943000) |> terra::ext())
#> Error in .local(x, ...): unused argument (extent = new("SpatExtent", ptr = new("Rcpp_SpatExtent", .xData = <environment>)))

I assume that this is a GDAL bug that will go away once R-tools gets to GDAL >= 3.4.0; so I am closing the issue.

Being quite a newbie here, let me ask a little naive question: How long would it take from your point of view approximately until Rtools gets to GDAL >= 3.4.0?

In the meantime, I would still be able to make use of rhdf5 - so this should not be that critical. However, since these are my first attempts to work with these containers - and as far as I understand, there are lots of flavors of hdf5 - is there some kind of a standard how spatial information attributes are be named and where to be placed in order to make the file readable for e.g. GDAL?

I'm asking because the information seems to be available per se (crs + origin + ncols/nrows + cellsize should add up to the extent) according to gdalinfo, but is named quite unconventionally from my personal point of view:

#[9] "  dataset_DXk_what_nodata=-999 "

#[37] "  where_projdef=UTM_N32"
#[41] "  where_UL_ipixel=195 "
#[42] "  where_UL_jpixel=5943 "
#[43] "  where_xscale_[m]=1000 "
#[44] "  where_xsize=536 "
#[45] "  where_yscale_[m]=1000 "
#[46] "  where_ysize=536 "

Thanks again!

dimfalk avatar Jun 27 '22 21:06 dimfalk

After a bit more digging, I now see that:

  • This also works on OSX (in my case with GDAL 3.2.2 )
  • RTools4.2 (which I was not using) has GDAL 3.4 but this does not change things for terra or stars. Both still fail to read the file values. (and note that to update GDAL for stars, you have to update it for sf ).

So, given that the file can be read on linux and OSX, but not with RTools GDAL on windows, it seems more likely an HDF5/GDAL building issue; perhaps similar to the issue discussed here: https://github.com/OSGeo/gdal/issues/1428. I am adding @jeroen who runs some of this, but I realize that he may not be able to do much with the limited information at hand.

As for the extent and crs. You can set the correct extent after you open the file, by extracting it from the metadata (I do not know how but I suppose that is documented somewhere), just like I show for the NAflag. It won't be automatic because the way it is stored does not appear to follow a more standard approach.

rhijmans avatar Jun 27 '22 22:06 rhijmans

No idea if this is relevant but I was also able to display the data in QGIS 3.20.1-Odense using GDAL 3.3.1 without any problems - making it seem like this is not an issue being fixed automatically with GDAL >= 3.4.0.

dimfalk avatar Jun 28 '22 08:06 dimfalk

Can you test if the problem appears on both R-4.1 and R-4.2? Note that R-4.2 uses a different GDAL (from Tomas Kaliber) than R-4.1 (from rwinlib).

jeroen avatar Jun 28 '22 08:06 jeroen

Yeah, sure.

library(terra)
#> Warning: Paket 'terra' wurde unter R Version 4.1.3 erstellt
#> terra 1.5.34

terra::rast("hd2011010000.scu")
#> Warning: [rast] unknown extent
#> class       : SpatRaster 
#> dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#> resolution  : 0.001865672, 0.001865672  (x, y)
#> extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : hd2011010000.scu 
#> name        : hd2011010000

terra::rast("hd2011010000.scu") |> plot()
#> Warning: [rast] unknown extent
#> Warning: H5Dread() failed for block. (GDAL error 1)
#> Warning: HDF5:"hd2011010000.scu":
#> //dataset_DXk/image, band 1: IReadBlock failed at X offset 0,
#> Y offset 0: H5Dread() failed for block. (GDAL error 1)
#> Error: [readValues] cannot read values
library(stars)
#> Warning: Paket 'stars' wurde unter R Version 4.1.3 erstellt
#> Lade nötiges Paket: abind
#> Warning: Paket 'abind' wurde unter R Version 4.1.1 erstellt
#> Lade nötiges Paket: sf
#> Warning: Paket 'sf' wurde unter R Version 4.1.3 erstellt
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

stars::read_stars("hd2011010000.scu")
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: H5Dread() failed for block.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: HDF5:"hd2011010000.scu"://dataset_DXk/
#> image, band 1: IReadBlock failed at X offset 0, Y offset 0: H5Dread() failed for
#> block.
#> Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : read failure
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17763)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
#> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] stars_0.5-5  sf_1.0-7     abind_1.4-5  terra_1.5-34
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3       compiler_4.1.0     pillar_1.7.0       highr_0.9         
#>  [5] class_7.3-20       tools_4.1.0        digest_0.6.29      tibble_3.1.7      
#>  [9] evaluate_0.15      lifecycle_1.0.1    pkgconfig_2.0.3    rlang_1.0.2       
#> [13] reprex_2.0.1       DBI_1.1.3          cli_3.3.0          rstudioapi_0.13   
#> [17] parallel_4.1.0     yaml_2.3.5         xfun_0.31          fastmap_1.1.0     
#> [21] e1071_1.7-11       withr_2.5.0        stringr_1.4.0      dplyr_1.0.9       
#> [25] knitr_1.39         generics_0.1.2     fs_1.5.2           vctrs_0.4.1       
#> [29] tidyselect_1.1.2   classInt_0.4-7     grid_4.1.0         glue_1.6.2        
#> [33] R6_2.5.1           fansi_1.0.3        rmarkdown_2.14     purrr_0.3.4       
#> [37] magrittr_2.0.3     codetools_0.2-18   htmltools_0.5.2    ellipsis_0.3.2    
#> [41] units_0.8-0        utf8_1.2.2         KernSmooth_2.23-20 stringi_1.7.6     
#> [45] proxy_0.4-27       lwgeom_0.2-8       crayon_1.5.1

dimfalk avatar Jun 28 '22 15:06 dimfalk

I also see this in R 4.1 and 4.3 --- both on windows

# R version 4.1.3 (2022-03-10) -- "One Push-Up"
library(terra)
#terra 1.5.48
gdal()
#[1] "3.4.1"
r = rast("hd2011010000.scu")
Warning message:
#[rast] unknown extent
r * 1
#Error: [*] cannot read from hd2011010000.scu
#R Under development (unstable) (2022-06-26 r82524 ucrt)
library(terra)
#terra 1.5.48
gdal()
#[1] "3.4.3"
r = rast("hd2011010000.scu")
#Warning message:
#[rast] unknown extent
r * 1
#Error: [*] cannot read from hd2011010000.scu

rhijmans avatar Jun 28 '22 18:06 rhijmans

Guess I am not contributing anything significantly new to this discussion but since I found some time to mess around with Linux, I can at least confirm that reading the data works on Ubuntu 20.04 with R 4.1.3 as well as R 4.2.1 using both, terra and stars, in contrast to the behaviour on windows (s. above).

# R version 4.1.3 (2022-03-10) -- "One Push-Up"
library(terra)
#terra 1.5.34
gdal()
#[1] "3.0.4"
r = rast("hd2011010000.scu")
Warning message:
#[rast] unknown extent
r * 1
#class       : SpatRaster 
#dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#resolution  : 0.001865672, 0.001865672  (x, y)
#extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source      : memory 
#name        : hd2011010000 
#min value   :         -999 
#max value   :           15 
# R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
library(terra)
#terra 1.5.34
gdal()
#[1] "3.0.4"
r = rast("hd2011010000.scu")
Warning message:
#[rast] unknown extent
r * 1
#class       : SpatRaster 
#dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#resolution  : 0.001865672, 0.001865672  (x, y)
#extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source      : memory 
#name        : hd2011010000 
#min value   :         -999 
#max value   :           15 

Since I feel like this is OS-related and not an issue with terra (or stars) per se, I do not know if this is really the right place to investigate (is it e.g. possible to move issues to another repo?)... However, if there is a way I might help, please let me know, @jeroen.

Thank you very much in advance!

dimfalk avatar Jul 11 '22 16:07 dimfalk

Perhaps @roualt can comment?

rhijmans avatar Jul 14 '22 23:07 rhijmans

followup from https://github.com/r-spatial/sf/issues/1995#issuecomment-1250199092

Windows HDF5 version seems to be 1.12.0 https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/hdf5.mk, confirmed in Rtools42 include/H5pubconf.h.

Fedora 36, HDF5 1.12.1, GDAL 3.5.2:

> library(terra)
terra 1.6.18
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent
 
> x
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : hd2011010000.scu 
name        : hd2011010000 
> y <- x + 1
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -998 
max value   :           16 

Windows 10, my binary build of terra from the sf issue:

> library(terra)
terra 1.6.18
> gdal()
[1] "3.5.0"
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent
 
> x
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : hd2011010000.scu 
name        : hd2011010000 
> y <- x * 1
|---------|---------|---------|---------|
=========================================
                                          
Warning messages:
1: H5Dread() failed for block. (GDAL error 1) 
2: HDF5:"C:/Users/RB/work/hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X offset 0, Y offset 0: H5Dread() failed for block. (GDAL error 1) 
> 
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
> 

No ideas from me, never use HDF at all; if HDF5 is not UCRT-ready, or uses std::regex, it might fail if the problem only occurred R >= 4.2, but is reported above in R 4.1 on Windows.

For completeness, R 4.2.1, HDF5 1.12.0 macOS arm64:

> library(terra)
terra 1.6.7
> gdal()
[1] "3.4.2"
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent
 
> #Warning message:
> #[rast] unknown extent
> y <- x * 1
> 
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 0.001865672, 0.001865672  (x, y)
extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -999 
max value   :           15 

rsbivand avatar Sep 18 '22 12:09 rsbivand

Windows OSGeo4W:

>gdalinfo  hd2011010000.scu
Driver: netCDF/Network Common Data Format
Files: hd2011010000.scu
Size is 536, 536
Metadata:
  /dataset_DXk/image#IMAGE_BACKGROUNDINDEX=0
  /dataset_DXk/image#IMAGE_COLORMODEL=RGB
  /dataset_DXk/image#IMAGE_MINMAXRANGE={0,0}
  /dataset_DXk/image#IMAGE_SUBCLASS=BITMAP
  /dataset_DXk/image#IMAGE_VERSION=1.2
  /dataset_DXk/what/NC_GLOBAL#elev [deg]=0.5
  /dataset_DXk/what/NC_GLOBAL#enddate [YYYYMMDD]=20201101
  /dataset_DXk/what/NC_GLOBAL#endtime [HHmmss]=000000
  /dataset_DXk/what/NC_GLOBAL#gain=1
  /dataset_DXk/what/NC_GLOBAL#nodata=-999
  /dataset_DXk/what/NC_GLOBAL#ori_format=DX
  /dataset_DXk/what/NC_GLOBAL#ori_name=HD2011010000.scu
  /dataset_DXk/what/NC_GLOBAL#product=COMP
  /dataset_DXk/what/NC_GLOBAL#quantity=RATE
  /dataset_DXk/what/NC_GLOBAL#rad=1
  /dataset_DXk/where/NC_GLOBAL#xsize=536
  /dataset_DXk/where/NC_GLOBAL#ysize=536
  /how/NC_GLOBAL#a1gate=1512001264
  /how/NC_GLOBAL#adjustment=1
  /how/NC_GLOBAL#angles [deg]=0.5
  /how/NC_GLOBAL#doppler=4
  /how/NC_GLOBAL#maxlevel [dBZ]=65
  /how/NC_GLOBAL#minlevel [dBZ]=0
  /how/NC_GLOBAL#nodes=ESS,NHB,FLD,
  /how/NC_GLOBAL#nodesn=3
  /how/NC_GLOBAL#place=fld
  /how/NC_GLOBAL#resolution [dBZ]=0.5
  /how/NC_GLOBAL#wavelength [cm]=5
  /how/NC_GLOBAL#WMO=121
  /how/NC_GLOBAL#zr_a=256
  /how/NC_GLOBAL#zr_b=1.42
  /what/NC_GLOBAL#date [YYYYMMDD]=20201101\
  /what/NC_GLOBAL#object=IMAGE
  /what/NC_GLOBAL#sets=1
  /what/NC_GLOBAL#time [HHmmss]=000000\
  /what/NC_GLOBAL#version [H5Vol 0.2]=SCP_BIG 5.0.11.24
  /where/NC_GLOBAL#poserror=
  /where/NC_GLOBAL#projdef=UTM_N32
  /where/NC_GLOBAL#range=128
  /where/NC_GLOBAL#start_lat [deg]=5943
  /where/NC_GLOBAL#start_lon [deg]=195
  /where/NC_GLOBAL#UL_ipixel=195
  /where/NC_GLOBAL#UL_jpixel=5943
  /where/NC_GLOBAL#xscale [m]=1000
  /where/NC_GLOBAL#xsize=536
  /where/NC_GLOBAL#yscale [m]=1000
  /where/NC_GLOBAL#ysize=536
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  536.0)
Upper Right (  536.0,    0.0)
Lower Right (  536.0,  536.0)
Center      (  268.0,  268.0)
Band 1 Block=64x72 Type=Float32, ColorInterp=Undefined
  Metadata:
    IMAGE_BACKGROUNDINDEX=0
    IMAGE_COLORMODEL=RGB
    IMAGE_MINMAXRANGE={0,0}
    IMAGE_SUBCLASS=BITMAP
    IMAGE_VERSION=1.2
    NETCDF_VARNAME=image
There are 2 HDF5 objects open!
>gdal_translate -of GTiff hd2011010000.scu hd2011010000.tif
Input file size is 536, 536
0...10...20...30...40...50...60...70...80...90...100 - done.
There are 2 HDF5 objects open!

Report: open objects on 72057594037927936
Type = File(72057594037927936) name='/'Type = Attribute(432345564227567616) name='CLASS'ERROR 1: netcdf error #-101 : NetCDF: HDF error .
at (D:\src\osgeo4w\src\gdal\gdal-3.5.1\frmts\netcdf\netcdfdataset.cpp,netCDFDataset::~netCDFDataset,2847)

Report: open objects on 72057594037927936
Type = File(72057594037927936) name='/'Type = Attribute(432345564227567616) name='CLASS'ERROR 1: netcdf error #-101 : NetCDF: HDF error .
at (D:\src\osgeo4w\src\gdal\gdal-3.5.1\frmts\netcdf\netcdfdataset.cpp,netCDFDataset::~netCDFDataset,2847)

The file converted with OSGeo4W's gdal_translate may or may not be complete, gdalinfo is showing errors of some kind. This is not the GDAL build used by Rtools42.

> library(terra)
terra 1.6.18
> x <- rast("hd2011010000.tif")
Warning message:
[rast] unknown extent
 
> y <- x * 1
|---------|---------|---------|---------|
=========================================
                                          
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -999 
max value   :           15 
>

rsbivand avatar Sep 18 '22 16:09 rsbivand

It appears that the cause of the problem is that the windows HDF5 library was built without the "szip" library. That library is needed to read this file (presumably because it was compressed using szip). See https://github.com/r-spatial/sf/issues/1995#issuecomment-1250364222 by @kalibera.

rhijmans avatar Sep 18 '22 19:09 rhijmans

@rhijmans I think that the problem is that szip has had an unfortunate license: https://support.hdfgroup.org/doc_resource/SZIP/. I'm unsure whether the HDF5 driver should expect to write to a file created with szip. It is an open question whether packagers of hdf5 should build with szip read/write or just default read. Could some check what for example Debian says about szip?

rsbivand avatar Sep 19 '22 11:09 rsbivand

Homebrew has replaced szip with libaec, see: https://gitlab.dkrz.de/k202009/libaec/-/blob/master/README.SZIP

jeroen avatar Sep 19 '22 11:09 jeroen

Fedora also distributes libaec, and HDF5 is built using it, so that explains being able to write to Szip-encoded files. MXE has aec, and the hdf5 MXE build now uses it. Untested with the latest MXE development versions.

rsbivand avatar Sep 20 '22 20:09 rsbivand

With current library binaries in https://www.r-project.org/nosvn/winutf8/ucrt3/gcc10_ucrt3_full_5349.tar.zst, copied into Rtools42, locally manually patched terra/src/Makevars.ucrt (https://www.r-project.org/nosvn/winutf8/ucrt3/patches/CRAN/terra.diff), and r-devel 5349 :

> library(terra)
terra 1.6.21
> gdal()
[1] "3.5.2"
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent
 
> y <- x * 1
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -999 
max value   :           15 
>

So when the next release of Rtools occurs, libael will be used when building HDF5. Thanks to @kalibera for adding AEC to the Rtools42 MXE build tree.

rsbivand avatar Sep 21 '22 08:09 rsbivand

Thank you very much @kalibera, @jeroen and @rsbivand!

rhijmans avatar Sep 21 '22 22:09 rhijmans

Thank you very much to everyone for digging into this!

dimfalk avatar Sep 22 '22 09:09 dimfalk