RS_FromGeoTiff error when reading GeoTiff file
Hello,
I am having an error when reading a GeoTiff file and invoking "RS_FromGeoTiff" function. The code:
` val sedona = SedonaContext.create(datioSparkSession.getSparkSession)
SedonaVizRegistrator.registerAll(sedona)
val filePath = DatioFileSystem.get().qualify("/in/staging/kris/custom/Aqueduct_FL100_2030_RCP45.tif").string()
sedona.read
.format("binaryFile")
.load(filePath)
.selectExpr("RS_FromGeoTiff(content) as raster", "path")
.selectExpr("raster", "RS_Metadata(raster) as metadata")
.show(false)`
The error thrown:
2025-01-29T09:44:40,061 [task-result-getter-1/134] [WARN] org.apache.spark.scheduler.TaskSetManager - Lost task 0.1 in stage 0.0 (TID 1) (ip-10-60-253-200.eu-south-2.compute.internal executor 13): org.apache.spark.sql.sedona_sql.expressions.InferredExpressionException: Exception occurred while evaluating expression RS_FromGeoTiff - inputs: [[B@44d7c680], cause: null at org.apache.spark.sql.sedona_sql.expressions.InferredExpression$.throwExpressionInferenceException(InferredExpression.scala:149) at org.apache.spark.sql.sedona_sql.expressions.InferredExpression.eval(InferredExpression.scala:113) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:408) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException at sun.misc.Unsafe.copyMemory(Native Method) at com.esotericsoftware.kryo.io.UnsafeOutput.writeBytes(UnsafeOutput.java:378) at com.esotericsoftware.kryo.io.UnsafeOutput.writeFloats(UnsafeOutput.java:348) at org.apache.sedona.common.raster.serde.KryoUtil.writeFloatArrays(KryoUtil.java:234) at org.apache.sedona.common.raster.serde.DataBufferSerializer.write(DataBufferSerializer.java:58) at org.apache.sedona.common.raster.serde.AWTRasterSerializer.write(AWTRasterSerializer.java:48) at org.apache.sedona.common.raster.DeepCopiedRenderedImage.write(DeepCopiedRenderedImage.java:453) at org.apache.sedona.common.raster.serde.Serde$SerializableState.write(Serde.java:125) at org.apache.sedona.common.raster.serde.Serde.serialize(Serde.java:173) at org.apache.spark.sql.sedona_sql.expressions.raster.implicits$RasterEnhancer.serialize(implicits.scala:46) at org.apache.spark.sql.sedona_sql.expressions.InferrableRasterTypes$.rasterSerializer(InferrableRasterTypes.scala:47) at org.apache.spark.sql.sedona_sql.expressions.InferredRasterExpression$.$anonfun$rasterSerializer$1(InferredRasterExpression.scala:54) at org.apache.spark.sql.sedona_sql.expressions.InferredExpression.eval(InferredExpression.scala:107) ... 19 more
I have tried the following:
- Same code with other file --> no error thrown
- Opening the file with QGIS --> loads the layer successfully
- Executing in a cluster environment, with more memory -> same error
- Same code in Python --> another error thrown:
`2025-01-29T11:28:27,041 [Thread-42/107] [DEBUG] com.amazonaws.emr.recordserver.connector.spark.sql.SparkPlanValidator - plan is Project [metadata#31, raster#27, point#32, org.apache.spark.sql.sedona_sql.expressions.raster.RS_Contains AS rs_contains(raster, point)#36]+- Project [raster#27, rs_metadata(raster#27) AS metadata#31, org.apache.spark.sql.sedona_sql.expressions.ST_Point AS point#32] +- Project [ org.apache.spark.sql.sedona_sql.expressions.raster.RS_FromGeoTiff AS raster#27, path#19] +- Relation [path#19,modificationTime#20,length#21L,content#22] binaryFile
2025-01-29T11:28:27,051 [Thread-11/37] [ERROR] dataproc.Main - Exception: [NOT_INT] Argument n should be an int, got bool.
`
Please, could you help me addressing this issue?
Thank you in advance.
Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better.
I saw java.lang.IllegalArgumentException being thrown by sun.misc.Unsafe.copyMemory(Native Method). The only reason I can think of is the uncompressed pixel data of the raster is larger than 4GB. Sedona cannot serialize and transfer such big rasters. You can try using RS_TileExplode to subdivide the raster into smaller tiles and perform tile-wise operations. This may help get rid of this error, but it will still be quite memory and time consuming.
Hello,
We have tried the following code:
df_floods_tile = sedona.sql(f"SELECT RS_TileExplode(content, 2, 2) FROM floods_tif") df_floods_tile = sedona.sql(f"SELECT RS_TileExplode(content, 100, 100) FROM floods_tif") df_floods_tile = sedona.sql(f"SELECT RS_TileExplode(content, 10, 10) FROM floods_tif")
and now the error thrown is this:
`An error was encountered: An error occurred while calling o232.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 10) (ip-10-60-253-102.eu-south-2.compute.internal executor 2): java.lang.IllegalArgumentException: Unsupported raster type: 73 at org.apache.sedona.common.raster.serde.Serde.deserialize(Serde.java:184) at org.apache.spark.sql.sedona_sql.expressions.raster.implicits$RasterInputExpressionEnhancer.toRaster(implicits.scala:38) at org.apache.spark.sql.sedona_sql.expressions.raster.RS_TileExplode.eval(RasterConstructors.scala:107) at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:224) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hasNext(Unknown Source) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:959) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:407) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2974) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2910) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2909) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.fore`
Is there any way to achieve the correct read of the file by tuning the argumnents passed to RS_TileExplode? If not, is there a way we can do something with Sedona in general?
Thank you.
The binary content needs to be loaded by RS_FromGeoTiff before being processed by RS_TileExplode. The intermediate raster object loaded by RS_FromGeoTiff will be tiled directly without being serialized/deserialized:
df_floods_tile = sedona.sql(f"SELECT RS_TileExplode(RS_FromGeoTiff(content), 100, 100) FROM floods_tif")
If this still does not work, you have to consider subdividing the GeoTIFF file using gdal_retile before loading it in Sedona.
How am I supposed to call RS_FromGeoTiff first if that call throws the error I reported at the beginning?
Actually nesting RS_FromGeoTiff inside another RS function call changes its behavior. Passing raster objects in between sedona function calls does not require serializing the entire raster value. It is handled by SerdeAware.evalWithoutSerialization.
For instance, the following code could produce a DataFrame of small tiles from a large raster that cannot be serialized as a whole:
(sedona.read
.format("binaryFile")
.load(filePath)
.selectExpr("RS_TileExplode(RS_FromGeoTiff(content), 100, 100) as (x, y, tile)", "path")
.selectExpr("tile", "RS_Metadata(tile) as metadata")
.show(10))
With exactly that code I am still getting the following error (which is slightly different from the other):
org.apache.spark.sql.sedona_sql.expressions.InferredExpressionException: Exception occurred while evaluating expression RS_FromGeoTiff - inputs: [[B@1d45d5d0], cause: I/O error reading image metadata! at org.apache.spark.sql.sedona_sql.expressions.InferredExpression$.throwExpressionInferenceException(InferredExpression.scala:149) at org.apache.spark.sql.sedona_sql.expressions.InferredExpression.evalWithoutSerialization(InferredExpression.scala:127) at org.apache.spark.sql.sedona_sql.expressions.raster.implicits$RasterInputExpressionEnhancer.toRaster(implicits.scala:34) at org.apache.spark.sql.sedona_sql.expressions.raster.RS_TileExplode.eval(RasterConstructors.scala:107) at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:407) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.geotools.data.DataSourceException: I/O error reading image metadata! at org.geotools.gce.geotiff.GeoTiffReader.<init>(GeoTiffReader.java:288) at org.apache.sedona.common.raster.RasterConstructors.fromGeoTiff(RasterConstructors.java:74) at org.apache.spark.sql.sedona_sql.expressions.raster.RS_FromGeoTiff$$anonfun$$lessinit$greater$6.apply(RasterConstructors.scala:55) at org.apache.spark.sql.sedona_sql.expressions.raster.RS_FromGeoTiff$$anonfun$$lessinit$greater$6.apply(RasterConstructors.scala:55) at org.apache.spark.sql.sedona_sql.expressions.InferrableFunctionConverter$.$anonfun$inferrableFunction1$2(InferrableFunctionConverter.scala:39) at org.apache.spark.sql.sedona_sql.expressions.InferredExpression.evalWithoutSerialization(InferredExpression.scala:121) ... 23 more Caused by: org.geotools.data.DataSourceException: I/O error reading image metadata! at org.geotools.gce.geotiff.GeoTiffReader.getHRInfo(GeoTiffReader.java:584) at org.geotools.gce.geotiff.GeoTiffReader.<init>(GeoTiffReader.java:274) ... 28 more Caused by: javax.imageio.IIOException: I/O error reading image metadata! at it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReader.readMetadata(TIFFImageReader.java:887) at it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReader.seekToImage(TIFFImageReader.java:834) at it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReader.getImageMetadata(TIFFImageReader.java:1446) at org.geotools.gce.geotiff.GeoTiffReader.getHRInfo(GeoTiffReader.java:340) ... 29 more Caused by: java.io.EOFException at javax.imageio.stream.ImageInputStreamImpl.readShort(ImageInputStreamImpl.java:229) at javax.imageio.stream.ImageInputStreamImpl.readUnsignedShort(ImageInputStreamImpl.java:242) at it.geosolutions.imageioimpl.plugins.tiff.TIFFIFD.initialize(TIFFIFD.java:237) at it.geosolutions.imageioimpl.plugins.tiff.TIFFImageMetadata.initializeFromStream(TIFFImageMetadata.java:148) at it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReader.readMetadata(TIFFImageReader.java:881) ... 32 more