geopyspark
geopyspark copied to clipboard
ValueError on tile_to_layout for old SpaceTime data
I create a RasterLayer of type SPACETIME
like so:
temporal_projected_extent = gps.TemporalProjectedExtent(extent=extent, proj4=crs, instant=datetime.datetime(1955,1,4))
tile = gps.Tile.from_numpy_array(var_data_at_instant, no_data_value)
tiles = [(temporal_projected_extent, tile)]
rdd = spark_ctx.parallelize(tiles)
raster_layer = gps.RasterLayer.from_numpy_rdd(layer_type=gps.LayerType.SPACETIME, numpy_rdd=rdd)
When running
tiled_raster_layer = raster_layer.tile_to_layout(gps.LocalLayout(y, x))
I get an exception:
2019-03-10 17:05:43 ERROR Executor:91 - Exception in task 2.0 in stage 2.0 (TID 10)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\worker.py", line 376, in main
File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\worker.py", line 371, in process
File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 142, in dump_stream
self._write_with_length(obj, stream)
File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 152, in _write_with_length
serialized = self.dumps(obj)
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufserializer.py", line 75, in dumps
return self._dumps(obj)
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufserializer.py", line 56, in _dumps
return self.encoding_method(obj)
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufcodecs.py", line 650, in tuple_encoder
tup.temporalProjectedExtent.CopyFrom(to_pb_temporal_projected_extent(obj[0]))
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufcodecs.py", line 553, in to_pb_temporal_projected_extent
tpex.instant = _convert_to_unix_time(obj.instant)
ValueError: Value out of range: -473126400000
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
The problem seems to be that geopyspark converts the date to Milliseconds since 1970 (which is -473126400000), and this value is too large.
Running the same code on the same rdd but with instant
e.g. 1980 works just fine.