geotrellis
geotrellis copied to clipboard
Cannot read the Tiff file by GDALRasterSource. Unable to construct dataset dimensions. GDAL Error Code: 4
Describe the bug
Cannot read the Tiff file by GDALRasterSource. Unable to construct dataset dimensions. GDAL Error Code: 4
To Reproduce
Provide as able:
- Steps to reproduce the behavior
- Code example
package com.example.gdalread
import cats.syntax.option._
import geotrellis.layer.{FloatingLayoutScheme, KeyExtractor, LayoutLevel, SpatialKey}
import geotrellis.proj4.LatLng
import geotrellis.raster.RasterSource
import geotrellis.raster.gdal.GDALRasterSource
import geotrellis.raster.resample.{Bilinear, PointResampleMethod}
import geotrellis.spark.{MultibandTileLayerRDD, RasterSourceRDD, RasterSummary}
import org.apache.log4j.{Level, Logger}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import geotrellis.raster.gdal._
import com.azavea.gdal._
object gdal_read_test {
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
System.load("/root/anaconda3/envs/gdal-3.1.2/lib/libgdal.so.27")
System.getProperty("java.library.path")
GDALWarp.init(100)
print("enter country_pop_sgdal_read_test tatus_____________21.12________________")
var startTime = System.currentTimeMillis();
implicit val conf =
new SparkConf()
.setAppName("gdal_read_test")
.setMaster("spark://master:7077")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrator", "geotrellis.spark.store.kryo.KryoRegistrator")
.set("spark.executor.cores", "6")
.set("spark.executor.memory", "4g")
.set("spark.driver.memory", "2g")
.set("spark.num.executors", "3")
.set("spark.cores.max", "20")
.set("spark.executorEnv.LD_LIBRARY_PATH", "/root/anaconda3/envs/gdal-3.1.2/lib/:/usr/local/lib")
.set("spark.dynamicAllocation.enabled","false")
.set("spark.default.parallelism","600")
.set("spark.repartitioning","true")
.set("spark.sql.shuffle.partitions","600")
implicit val sc = new SparkContext(conf)
val Path="/geo/file/raster/RS/Landsat/L71149033_03320030531_B10.TIF"
println("path",Path)
val targetCRS = LatLng
println("targetCRS",targetCRS)
val method: PointResampleMethod = Bilinear
val tilesize= 256 // 256
val layoutScheme = FloatingLayoutScheme(tilesize)
val raster_source_single=GDALRasterSource(Path)
val raster_source=Seq(raster_source_single)
val sourceRDD: RDD[RasterSource] =sc.parallelize(raster_source)
val summary = RasterSummary.fromRDD(sourceRDD)
val LayoutLevel(zoom, layout) = summary.levelFor(layoutScheme)
val context_rdd: MultibandTileLayerRDD[SpatialKey] = RasterSourceRDD.tiledLayerRDD(sourceRDD, layout, KeyExtractor.spatialKeyExtractor, rasterSummary = summary.some)
val sum_resu_rdd: RDD[Int] =context_rdd.map{ single_rdd=>
single_rdd._2.band(0).toArray().sum
}
val sum_resu=sum_resu_rdd.collect()
println("result: ",sum_resu)
val endTime = System.currentTimeMillis
println("total time is ",(endTime - startTime) / 1000,"s")
sc.stop()
}
}
- Inputs
- Actual output
- encouter erroe when reading any tiff
- Expected output
- read the tif and output the sum value of exach tile
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
CentOS Linux release 7.9.2009 (Core)
-
Java version:
-
java version "11.0.12" 2021-07-20 LTS Java(TM) SE Runtime Environment 18.9 (build 11.0.12+8-LTS-237) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.12+8-LTS-237, mixed mode)
-
Scala version:
-
2.12.8
-
GeoTrellis version:
-
3.5.2
Additional context
Add any other context about the problem here. bugreport.zip bugreport2.zip bugreport3.zip bugreport4.zip
Hey @qw845602, could you minimize example? i.e:
GDALRasterSource("path/to/tiff").rasterExtent
The other thing is GDAL Error Code: 4: it can mean so many things, could you post a stack trace here as well? It usually writes below down what was the function that caused problems.
Also for the context from Gitter: there is a good chance, that GDAL is improperly installed / java.library.path is improperly set, and it all can be connected.
Hey @qw845602, could you minimize example? i.e:
GDALRasterSource("path/to/tiff").rasterExtentThe other thing is
GDAL Error Code: 4: it can mean so many things, could you post a stack trace here as well? It usually writes below down what was the function that caused problems.Also for the context from Gitter: there is a good chance, that GDAL is improperly installed / java.library.path is improperly set, and it all can be connected.
What is a stack trace? Just indicate which function cause the problem or where the error occurs?
The stack trace is the actual error that includes the functions stack call, you already sent it in gitter.
Ok, here it is:
Caused by: geotrellis.raster.gdal.MalformedDataException: Unable to construct dataset dimensions. GDAL Error Code: 4
at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1(GDALDataset.scala:160)
at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1$adapted(GDALDataset.scala:157)
at geotrellis.raster.gdal.GDALDataset$.errorHandler$extension(GDALDataset.scala:406)
at geotrellis.raster.gdal.GDALDataset$.dimensions$extension1(GDALDataset.scala:157)
at geotrellis.raster.gdal.GDALDataset$.rasterExtent$extension1(GDALDataset.scala:197)
at geotrellis.raster.gdal.GDALRasterSource.gridExtent$lzycompute(GDALRasterSource.scala:93)
at geotrellis.raster.gdal.GDALRasterSource.gridExtent(GDALRasterSource.scala:93)
at geotrellis.raster.RasterMetadata.extent(RasterMetadata.scala:52)
at geotrellis.raster.RasterMetadata.extent$(RasterMetadata.scala:52)
at geotrellis.raster.RasterSource.extent(RasterSource.scala:43)
at geotrellis.spark.RasterSummary$.$anonfun$collect$1(RasterSummary.scala:108)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:194)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:834)
Hey @qw845602, could you minimize example? i.e:
GDALRasterSource("path/to/tiff").rasterExtentThe other thing is
GDAL Error Code: 4: it can mean so many things, could you post a stack trace here as well? It usually writes below down what was the function that caused problems.Also for the context from Gitter: there is a good chance, that GDAL is improperly installed / java.library.path is improperly set, and it all can be connected.
The error was uploaded in bugreport2. I didn't find GDALRasterSource("path/to/tiff").rasterExtent function and used GDALRasterSource("path/to/tiff").dimensions instead. The error is "[1 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /geo/file/raster/RS/Landsat/L71149033_03320030531_B10.TIF no such file or directory" . It seems that GDAL could not find the Tiff. What needs to be mentioned is that the tiff file is stored in HDFS.
@qw845602 well that's a different issue, you'd need to have a GDAL build with HDFS support; I don't believe it's enabled y default.
Also this approach even in case it works may lead to extra overhead caused by the extra JVM that GDAL will create to establish HDFS connection.
@qw845602 well that's a different issue, you'd need to have a GDAL build with HDFS support
Can GDALRasterSource read TIFF directly from the disk in the spark cluster? Does it needs to put the tiff on each node at the same location? I remembered that HadoopGeotiffRDD could not read the tif file in the disk in spark cluster mode.
@qw845602 both HadoopGeotiffRDD and GDALRasterSource can read files directly from cluster local disks, yes, in this case you'd need to have copies all over the places.
We've never encountered these issues since were relying mostly on S3 storage, and GDAL supports S3 reads by default.
@qw845602 both
HadoopGeotiffRDDand GDALRasterSource can read files directly from cluster local disks, yes, in this case you'd need to have copies all over the places.We've never encountered these issues since were relying mostly on S3 storage, and GDAL supports S3 reads by default.
Yeah, I have tried to read from the cluster local disks, the code and errror are shown in bugreport3. The error is " java.lang.IllegalArgumentException: requirement failed: x-aligned: offset by CellSize".It is also an error occured when the code is run in local mode, which i have mentioned before in the thread of gitter.
@qw845602 could you post a minimized code to reproduce requirement failed: x-aligned: offset by CellSize? I believe this is related to tiling to layout though, not to reading.
@qw845602 let me summarize:
- There are problems with
GDALinstallation clsuter wide TIFFsare located inHDFSso that's problematic to read, and it definitely explains theGDAL Error 4that you had- Solution to that is to have
GDALwithHDFSsupport installed on all nodes, and to have them configured soGDALhas access toHDFS
- Solution to that is to have
GDALRasterSourceworks fine with local reads, however you experience issues with tiling it to layout- There are indeed some issues related to GDAL reads and tiling, and it is caputred here: https://github.com/locationtech/geotrellis/issues/3292
- The initial reason why
GDALRasterSourceis used related to- Too large file sizes, which may trigger https://github.com/locationtech/geotrellis/issues/3065
- Too large segments (all TIFFs you work with are striped) https://github.com/locationtech/geotrellis/issues/1691
Is it a correct summary?
@qw845602 could you post a minimized code to reproduce
requirement failed: x-aligned: offset by CellSize? I believe this is related to tiling to layout though, not to readi
@qw845602 let me summarize:
- There are problems with
GDALinstallation on a clusterTIFFsare located onHDFSso that's problematic to read, and it definitely explains theGDAL Error 4that you had
- Solution to that is to have
GDALwithHDFSsupport installed on all nodes, and to have them configured soGDALhas access toHDFS
- When trying local reads,
GDALRasterSourceworks, however you experience issues when performing tiling to layout
- There are indeed some issues related to GDAL reads and tiling, and it is caputred here: LayoutTileSource.requireGridAligned is failing with GDALRasterSource #3292
- The initial reason why GDAL is related to
- Too large file sizes, which may trigger Read single-band TIFF files, large than 2G #3065
- Too large segments (all TIFFs you work with are striped) Potential Issue With GeoTiff Reading in the Future due to too large segments dimensions #1691
Is it a correct summary?
1 to 2 are correct. For summary 3, I am not quite sure is it related with performing tiling to layout. I found it needs to indicate the layoutcheme in https://github.com/pomadchin/vlm-performance/blob/feature/gt-3.x/src/main/scala/geotrellis/contrib/performance/IngestRasterSource.scala#L52:L59, I only know two types of layoutscheme, including ZoomedLayoutScheme and FloatingLayoutScheme. Since the tif need to be processed as a pyramid, i chose the FloatingLayoutScheme. Are there any other solutions to create a "TileLayerRDD[SpatialKey]" using GDALRasterSource? I have read the link in summary 3, but i have not find a solution to that. For summay 4, yeah, the tif file is very large, about several hundred GB, but i am not quite sure about the reason. It encounters ArrayIndexOutOfBoundsException error using HadoopGeotiffRdd.
@qw845602 yea, 3. is exactly about it; :+1:
I'm afraid there are no quick / easy solutions to your problem: or to figure out GDAL issues and get really deep into it, or to use GDAL to convert TIFFs into tiled and compressed TIFFs: gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZW
The last one would not hurt to try, at least to check that it can work as expected with your data.
I have translated the tif using the command gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZW, however, the error " java.lang.IllegalArgumentException: requirement failed: x-aligned: offset by CellSize" still exist. It is so strange.
@qw845602 is it by using non GDAL reads? Try it without GDAL
@qw845602 yea,
3.is exactly about it; 👍I'm afraid there are no quick / easy solutions to your problem: or to figure out GDAL issues and get really deep into it, or to use GDAL to convert TIFFs into tiled and compressed TIFFs:
gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZWThe last one would not hurt to try, at least to check that it can work as expected with your data.
I have upload the tif after translated as well as the code and error in bugreport4. I have also tried zoomlayoutscheme, but it also cause the same error. So i don't know how to deal with the layoutscheme.
How to Try it without GDAL?
@qw845602 is it by using non GDAL reads? Try it without GDAL
Some error occured in uploading bugreport4, now it is uploaded successfully. Is it mean that I need to translate the tif which caused Arrayindexoutofbound error and to see if it could be read by HadoopGeoTiffRDD?
@qw845602 yes, you may try HadoopGeoTiffRDD, but you can also replace GDALRasterSource with RasterSource - it will use non GDAL underlying reader
@qw845602 yea,
3.is exactly about it; 👍I'm afraid there are no quick / easy solutions to your problem: or to figure out GDAL issues and get really deep into it, or to use GDAL to convert TIFFs into tiled and compressed TIFFs:
gdal_translate in.tif out.tif -co TILED=YES -co COMPRESS=LZWThe last one would not hurt to try, at least to check that it can work as expected with your data.
Yeah,it works by using the command "gdal_translate in.tif out.tif -co BIGTIFF=YES -co TILED=YES -co COMPRESS=LZW", After translating the tif, I can read the tif as rdd using the function hadoopGeoTiffRDD.
@pomadchin Hello, I'm currently working on using the geotrellis-server project to publish a WMTS service. I'm providing a data link as the source: "file:///E:/Geotrellis/Tiles/attributes?layers=tiles&zoom=10&band_count=1". Under this path, I have pre-cut tile data using Geotrellis.
I'm using Scala 2.12.8, Geotrellis 3.6.1, and GDAL 3.0.4. And I'm on Windows operating system. My stack trace is as follows: 17:36:31.296 [raster-io-0] DEBUG geotrellis.server.ogc.Main - GetCapabilities: /?SERVICE=WMS&REQUEST=GetCapabilities 17:36:31.369 [raster-io-0] ERROR org.http4s.server.service-errors - Error servicing request: GET / from 127.0.0.1 geotrellis.raster.gdal.MalformedDataException: Unable to construct dataset dimensions. GDAL Error Code: 4 at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1(GDALDataset.scala:160) at geotrellis.raster.gdal.GDALDataset$.$anonfun$dimensions$1$adapted(GDALDataset.scala:157) at geotrellis.raster.gdal.GDALDataset$.errorHandler$extension(GDALDataset.scala:422) at geotrellis.raster.gdal.GDALDataset$.dimensions$extension1(GDALDataset.scala:157) at geotrellis.raster.gdal.GDALDataset$.rasterExtent$extension1(GDALDataset.scala:197) at geotrellis.raster.gdal.GDALRasterSource.gridExtent$lzycompute(GDALRasterSource.scala:93) at geotrellis.raster.gdal.GDALRasterSource.gridExtent(GDALRasterSource.scala:93) at geotrellis.server.ogc.wms.CapabilitiesView$.$anonfun$modelAsLayer$2(CapabilitiesView.scala:277) at scala.collection.immutable.List.map(List.scala:293) at geotrellis.server.ogc.wms.CapabilitiesView$.$anonfun$modelAsLayer$1(CapabilitiesView.scala:265) at map @ geotrellis.server.ogc.wms.CapabilitiesView$.modelAsLayer(CapabilitiesView.scala:264) at mapN @ geotrellis.server.ogc.wms.CapabilitiesView$.modelAsLayer(CapabilitiesView.scala:291) at mapN @ geotrellis.server.ogc.wms.CapabilitiesView$.modelAsLayer(CapabilitiesView.scala:291) at map @ geotrellis.server.ogc.wms.CapabilitiesView.toXML(CapabilitiesView.scala:111) at flatMap @ geotrellis.server.ogc.wms.WmsView.$anonfun$responseFor$5(WmsView.scala:142) at delay @ io.chrisdavenport.log4cats.slf4j.internal.Slf4jLoggerInternal$Slf4jLogger.$anonfun$debug$4(Slf4jLoggerInternal.scala:68) at delay @ io.chrisdavenport.log4cats.slf4j.internal.Slf4jLoggerInternal$Slf4jLogger.isDebugEnabled(Slf4jLoggerInternal.scala:50) at ifM$extension @ io.chrisdavenport.log4cats.slf4j.internal.Slf4jLoggerInternal$Slf4jLogger.info(Slf4jLoggerInternal.scala:76) at >>$extension @ geotrellis.server.ogc.wms.WmsView.responseFor(WmsView.scala:141) at sequence @ org.http4s.HttpRoutes$.$anonfun$of$2(HttpRoutes.scala:79) at defer @ org.http4s.HttpRoutes$.$anonfun$of$1(HttpRoutes.scala:79) at $anonfun$combineK$1 @ org.http4s.syntax.KleisliResponseOps.$anonfun$orNotFound$1(KleisliSyntax.scala:49) at getOrElse @ org.http4s.syntax.KleisliResponseOps.$anonfun$orNotFound$1(KleisliSyntax.scala:49) at defer @ org.http4s.server.blaze.Http1ServerStage$$anon$2.run(Http1ServerStage.scala:200) at flatMap @ org.http4s.server.blaze.Http1ServerStage$$anon$2.run(Http1ServerStage.scala:202) [1 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." `/E:/Geotrellis/Tiles/attributes?layers=tiles&zoom=10&band_count=1' does not exist in the file system, and is not recognized as a supported dataset name.
How can I solve this problem? Thank you very much!