SpacePartitioner distributes everything to just a few partitions/executors
Describe the bug
I might just be using this wrong, because I can't find much documentation at all on how to use the SpacePartitioner. I'm trying to use the SpacePartitioner on an RDD of type RDD[(SpatialKey, geotrellis.vector.Feature)]. I've created a LayoutDefinition that covers the entire pan-tropics (so 180E-180W and 30S-30N), and vector features cover are land in those bounds, so I just set the KeyBounds to the entire layout.
But even though I'm running with ~700 executors, it's giving everything to just 3 executors (vs. when I use HashPartitioner, where it distributes to pretty much all executors). I'm not sure if I'm just using it incorrectly (my bounds seem wrong) or it just doesn't support global-scale partitioning that well (this post makes it seem like it's intended for more local areas with clear bounding boxes).
Environment
- Scala version: 2.11.12
- Spark version: 2.4.0
- GeoTrellis version: 2.2.0
Hi @jterry64, can you give a number of partitions with the hash partitioner and space partitioner?
Also the range of keys you have in RDDs and the exact SpacePartitioner / HashPartitioner you're using?
Do you have some codesample with an example of how you use partitioners?
A code sample is pretty much all like this:
val spatialPart = new SpacePartitioner(KeyBounds(SpatialKey(0, 0), SpatialKey(GladAlertsGrid.blockTileGrid.layoutCols, GladAlertsGrid.blockTileGrid.layoutRows)), )
Where GladAlertsGrid.blockTileGrid is a LayoutDefinition where each cell is the block size across the extent mentioned above. So in this case, it ends up 3600 columns and 600 rows. It looks like the SpatialPartitioner is making 8588 partitions from that grid. Then I just call partitionBy(spatialPart) on an RDD with type RDD[(SpatialKey, Feature)], where Feature is geotrellis vector features. Also fyi, there may be many overlapping features per Spatial Key.