sentinelhub-py icon indicating copy to clipboard operation
sentinelhub-py copied to clipboard

Tiles and dates obtained from WcsRequest do not correspond to each other

Open pisarik opened this issue 5 years ago • 5 comments

Problem statement

I call req=WcsRequest(...), then I want to make additional filtering of tiles before actual downloading. So I use req.get_tiles() and req.get_dates() and produce from them indexes_to_dowload for the next call req.save_data(data_filter=indexes_to_download).

But it turned out that get_tiles() and get_dates() lists do not correspond to each other.

  1. Generally get_tiles() and get_dates() in opposite direction. I.e. I need to reverse one of them to make them correspond to each other.
  2. But if request was done with parameter time_difference and bounding box lies on intersection of tiles, then things go even worse. Then I get lists of different sizes, i.e.
    len(req.get_tiles()) != len(req.get_dates())
    
    In this case it is even difficult to understand what indexes do I need to provide to save_data: either from get_dates() or get_tiles(). I guess that I need to use indexes of get_dates(), because it seems that get_dates() in correspondence with get_download_list(), at least by length.

So eventually I need to write a code for matching between lists of dates and tiles.

Minimal reproducing code (the worst case)

def main(INSTANCE_ID):
    intersection_of_tiles_wgs84 = [5.62788, 51.62314, 6.44577, 51.05245]
    intersection_bbox = BBox(bbox=intersection_of_tiles_wgs84, crs=CRS.WGS84)

    req = WcsRequest(layer='BANDS-S2-L1C',
                     bbox=intersection_bbox,
                     time=('2019-05-01', '2019-06-01'),
                     time_difference=datetime.timedelta(minutes=60),
                     resx='10m', resy='10m',
                     instance_id=INSTANCE_ID)

    tiles = req.get_tiles().tile_list
    dates = req.get_dates()
    for tile, date in zip(tiles, dates):
        print(tile['properties']['date'], tile['properties']['time'],
              ' | ',
              date)
    print('Number of tiles: ', len(tiles))
    print('Number of dates: ', len(dates))

Output

2019-05-31 10:56:59  |  2019-05-01 10:56:39
2019-05-31 10:56:54  |  2019-05-03 10:46:39
2019-05-31 10:56:50  |  2019-05-06 10:56:35
2019-05-31 10:56:45  |  2019-05-08 10:46:44
2019-05-31 10:56:41  |  2019-05-11 10:56:40
2019-05-31 10:56:39  |  2019-05-13 10:46:39
2019-05-28 10:47:03  |  2019-05-16 10:56:35
2019-05-28 10:47:00  |  2019-05-18 10:46:44
2019-05-28 10:46:58  |  2019-05-23 10:46:39
2019-05-28 10:46:48  |  2019-05-26 10:56:34
2019-05-28 10:46:45  |  2019-05-28 10:46:44
2019-05-28 10:46:44  |  2019-05-31 10:56:39
Number of tiles:  72
Number of dates:  12

Default code I would like to write

Here is the code, that I would expect to work correctly, but it doesn't.

req = WcsRequest(**any_parameters)

# filter before download
idxs = []
for i, (tile, date) in enumerate(zip(req.get_tiles(), req.get_dates())):
    if need_to_download(tile, date):
        idxs.append(i)

req.save_data(data_filter=idxs)

Is it a bug or maybe I haven't found a proper way of doing it?

Anyway, thanks for the great API !:)

pisarik avatar Sep 16 '19 19:09 pisarik

I believe that get_dates() returns only one date per orbit (or datatake) where get_tiles() returns the meta-data for all the tiles in the orbit. So you need to decide, what you would like to achieve and use the appropriate function. You can also take into account that orbit takes 100 minutes with Sentinel-2 and take this into account when setting time buffers.

gmilcinski avatar Sep 17 '19 08:09 gmilcinski

@gmilcinski thanks for the answer!

get_dates() returns one date per orbit only in those particular example. If I remove time_difference, then get_dates() will return the same list as get_tiles(), but reversed. And if I will set time_difference to 10 seconds, then it will return 24 dates.

Inconsistency with documentation

get_dates

I am just read once more thoroughly documentation for get_dates method. At the end it states:

Most recent acquisition being first in the list.

But it can be seen from reproducing code that the first date in the list is the oldest date.

get_tiles

Documentaiton

Returns iterator over info about all satellite tiles used for the OgcRequest

It is not quite clear what does it mean all satellite tiles used for the request. Eventually it filters meta information for all tiles out of date range, but it fails to filter meta information with respect to time_difference parameter. Both parameters were used for the OgcRequest.

My intention

I believe that get_tiles intended to provide meta information for the tiles that will be downloaded.

I want to obtain meta information for the tiles that will be downloaded, how can I achieve it?

pisarik avatar Sep 17 '19 10:09 pisarik

Well a little inconsistency at the end of the method OgcImageService.get_dates().

Problem code

    class OgcImageService(OgcService):
        # ...
        def get_dates(self, request):  # from documentation supposed to return recent first
            # ...
            return OgcService._filter_dates(dates, request.time_difference)


    class OgcService:
        def _filter_dates(dates, time_difference):  # from documentation supposed to return oldest first
            #...
            sorted_dates = sorted(dates)  # oldest dates first 
            # filter with respect to time_difference, no change order
            return separate_dates

It seems that there is no important code that depends on assumption that get_dates should return recent acquisitions first.

Anyway I suggest to clarify documentation on get_tiles() method and add a new one method get_filtered_tiles() that will return meta information consistent with get_dates() and get_download_list() methods.

I think it is a very natural wish to save meta information together with tile. But in current state of API to do it you need to duplicate code of _filter_dates(..) method what potentially leads to errors in client code.

Is it worth to make a pull request on it?

pisarik avatar Sep 18 '19 11:09 pisarik

Hi @pisarik,

I checked the code and I can confirm the following:

  • There is a mistake in the documentation of get_dates method. The method will always return dates from least recent to most recent.
  • Documentation about get_tiles could have amore detailed explanation.

Explanation, why get_tiles and get_dates return different number of items is the following:

A user-specified bounding box is likely to overlap with multiple Sentinel-2 tiles that either have completely the same acquisition timestamp or timestamps that differ for a few minutes. Therefore Sentinel Hub service will collect data from multiple tiles with the same timestamp and join them into a single image that will be provided through the Python package. This means that one part of an image could be obtained from one tile and the other part from another. Hence get_tiles returns info about all tiles from which Sentinel Hub service will collect some data from. On the other hand, get_dates will return acquisition dates of returned images. The time_difference parameter is used to tell Sentinel Hub service that it can join even acquisitions that differ for the specified amount of time (e.g. few minutes or hours) into a single one.

We have just started working on adding support for new Sentinel Hub API in this package. So we can update this part of the documentation as well.

Regarding your suggestion about get_filtered_tiles I don't think it makes sense as there will usually be more tiles involved than the number of images returned by the service. The service also doesn't return information if some of these tiles haven't been used at all.

AleksMat avatar Sep 25 '19 12:09 AleksMat

Dear @AleksMat,

Thank you very much for thorough explanation!

Merging several tiles into a single image

Are you sure that tiles with timestamps withing time_difference will be merged into a single image? Actually this parameter almost doesn't have explanation in documentation. The only place is 8th example and there is nothing said about merging tiles into one single image.

As I could understand from example explanation and code documentation [1] [2], if several tiles happened to be in time_difference interval, then all tiles will be rejected except the oldest one.

If they really merge in a single image, could you please provide an idea of merging? Namely, how conflicts do resolved in case of overlapping of non-blank areas from several tiles?

Obtaining meta information for the downloaded images

I think you missed my point. Suppose we are in good case (bbox belongs only to one tile). I want to save a meta information of tile (cloud coverage, path on aws, etc.) along with images. How can I get this meta information?

Proper question for all cases How can I get a list of meta information for the tiles that were used to produce a downloaded image?

pisarik avatar Sep 28 '19 20:09 pisarik

Closing due to long-term inactivity

zigaLuksic avatar Jan 18 '23 09:01 zigaLuksic