openeo-python-client icon indicating copy to clipboard operation
openeo-python-client copied to clipboard

Job upscaling: spatial splitting utilities

Open jdries opened this issue 10 months ago • 5 comments

When using the job manager, users still need to somehow construct the GeoDataFrame that defines initial job splitting.

A typical use case is that users want to run a job over e.g. a full country, and don't want to know the details about tile grids.

There are 'well-known' tile grids that apply: UTM at different sizes for global processing, and LAEA for Europe.

The utility should focus on making UDP based upscaling as simple as possible, reducing the required input parameters to a minimum, while having optional parameters in case the user has preferences. It is also an option for the UDP itself to formally indicate splitting options, like a preferred tile grid. For instance, if UDP author knows that included job options are optimized for 20km tiles, it makes sense for the job splitter to take this into account, allowing for a more predictable outcome.

https://github.com/Open-EO/openeo-python-client/blob/c1589a842161cccc27cf466c0c2b043da2eba906/docs/cookbook/job_manager.rst#L124

Existing and similar code in aggregator: https://github.com/Open-EO/openeo-aggregator/blob/master/src/openeo_aggregator/partitionedjobs/splitting.py

jdries avatar Feb 19 '25 09:02 jdries

We now have 3 types of job splitters which we are looking into @VictorVerhaert,

@jdries would you propose here to have it 'non-optimized' and fixed splitting per 20km tiles?

Or should we offer eventually also alternative options for sentinel2 tile based splitting ect.

HansVRP avatar Feb 19 '25 10:02 HansVRP

Should we perhaps turn this into an epic? Because non-optimized would be the most basic thing to start with, and then we should indeed add things like UTM grid. (Overlapping S2 tiles is not a grid I typically recommend, so that would be more like an advanced option.)

jdries avatar Feb 19 '25 10:02 jdries

To get the correct scope of this issue: is it only for splitting bounding boxes (e.g. for inference) or also for geometries (e.g. for point extractions).

The 3 existing job splitters in gfmap are to be used mainly for extractions I would say so maybe not that relevant for this issue

@jdries do we know what the current maximum spatial extent currenty is given all the optimizations and parallelisations done for lcfm? perhaps running a job for a whole county is already possible and we could rather focus on testing it with setting the output tiling grid like the save_Result option for GTiffs to ensure we don't create too large COG's.

VictorVerhaert avatar Feb 19 '25 10:02 VictorVerhaert

@jdries perhaps good if we indeed create a couple of sub-tasks for this one.

It would be good to have a clear sight on how we want users to 'ínteract' or 'experience' this feature.

HansVRP avatar Feb 24 '25 10:02 HansVRP

We need to validate whether solution would also be applicable for WEED inference runs

HansVRP avatar Mar 18 '25 07:03 HansVRP