geotrellis icon indicating copy to clipboard operation
geotrellis copied to clipboard

SpacePartitioner distributes everything to just a few partitions/executors

Open jterry64 opened this issue 5 years ago • 2 comments

Describe the bug

I might just be using this wrong, because I can't find much documentation at all on how to use the SpacePartitioner. I'm trying to use the SpacePartitioner on an RDD of type RDD[(SpatialKey, geotrellis.vector.Feature)]. I've created a LayoutDefinition that covers the entire pan-tropics (so 180E-180W and 30S-30N), and vector features cover are land in those bounds, so I just set the KeyBounds to the entire layout.

But even though I'm running with ~700 executors, it's giving everything to just 3 executors (vs. when I use HashPartitioner, where it distributes to pretty much all executors). I'm not sure if I'm just using it incorrectly (my bounds seem wrong) or it just doesn't support global-scale partitioning that well (this post makes it seem like it's intended for more local areas with clear bounding boxes).

Environment

  • Scala version: 2.11.12
  • Spark version: 2.4.0
  • GeoTrellis version: 2.2.0

jterry64 avatar Jul 22 '20 21:07 jterry64

Hi @jterry64, can you give a number of partitions with the hash partitioner and space partitioner?

Also the range of keys you have in RDDs and the exact SpacePartitioner / HashPartitioner you're using?

Do you have some codesample with an example of how you use partitioners?

pomadchin avatar Jul 23 '20 20:07 pomadchin

A code sample is pretty much all like this:

val spatialPart = new SpacePartitioner(KeyBounds(SpatialKey(0, 0), SpatialKey(GladAlertsGrid.blockTileGrid.layoutCols, GladAlertsGrid.blockTileGrid.layoutRows)), )

Where GladAlertsGrid.blockTileGrid is a LayoutDefinition where each cell is the block size across the extent mentioned above. So in this case, it ends up 3600 columns and 600 rows. It looks like the SpatialPartitioner is making 8588 partitions from that grid. Then I just call partitionBy(spatialPart) on an RDD with type RDD[(SpatialKey, Feature)], where Feature is geotrellis vector features. Also fyi, there may be many overlapping features per Spatial Key.

jterry64 avatar Jul 27 '20 21:07 jterry64