rio-tiler Band expression for MultiBaseReader

ref https://github.com/cogeotiff/rio-tiler/pull/523#discussion_r963304743

Right now when passing expression to MultiBaseReader (STAC) methods, we assume each asset is one band e.g:

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, expression="green/red")
    print(ima.count)
    >>> 1  # it returns a 1 band image

if one asset has more than one band and if no other option is set, the result could be unexpected.

The way expression works is by using

def apply_expression(
    blocks: Sequence[str], bands: Sequence[Union[str, int]], data: numpy.ndarray,
) -> numpy.ndarray:
    return numpy.array(
        [
            numpy.nan_to_num(
                numexpr.evaluate(bloc.strip(), local_dict=dict(zip(bands, data)))
            )
            for bloc in blocks
            if bloc
        ]
    )

where we assign a name to each band within the numpy array

data = numpy.zeros((3, 256, 256))
list(zip(["b1", "b2"], data))
>>> [('b1',
  array([[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]])),
 ('b2',
  array([[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]))]

In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr. We can see that no errors is raised while we have 3 bands and only 2 names.

Proposal

with #523 we now have bands names all the way during the processes, e.g

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"))
    print(ima.band_names)
    >>> ["green_b1", "red_b1"]

It means that in theory expression could be in form of expression="green_b1/red_b1"

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, expression="green_b1/red_b1")
    print(ima.band_names)
    >>> ["green_b1/red_b1"]

internally rio-tiler will know that for expression "green_b1/red_b1" we want to read assets green and red. Then we will have an image with band_names="green_b1", "red_b1". Those names will be used in the apply_expresion directly because apply_expression is a new method of the ImageData class

https://github.com/cogeotiff/rio-tiler/blob/55ae89749204de11eef26e06c962d9744241f132/rio_tiler/models.py#L278-L283

potential issue

Another question is what happens when we use asset_expression?

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"), asset_expression={"green": "b1+2"})
    print(ima.band_names)
    >>> ["green_b1+2", "red_b1"]

☝️ we have then a band name that is "green_b1+2" so if we wanted to use expression we would have to write expression= "green_b1+2/red_b1" but that won't work because rio-tiler will try to parse the main expression and re-apply! +2 to green_b1+2!!! One solution would be to use _b1 as the first band of the asset even if there is expression applied.

Status Quo ?

While I feel ☝️ is better, I understand that it's a big breaking change.

Sep 07 '22 11:09 vincentsarago

In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr. We can see that no errors is raised while we have 3 bands and only 2 names.

I've added a check in https://github.com/cogeotiff/rio-tiler/pull/523/commits/d883df00f134aed83ad5c1097e4952ff827b812e to make sure the number of bands and data shape match

Sep 07 '22 12:09 vincentsarago

It means that in theory expression could be in form of expression="green_b1/red_b1"

I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.

One solution would be to use _b1 as the first band of the asset even if there is expression applied.

I think appending _b1 is a good way to normalize applying of an expression to single and multi-band images.

Another question is what happens when we use asset_expression?

Can asset expressions be removed entirely if a user can pass assets to the expression? It seems to me the difference between expression and asset_expression is the former may be used to combine data from multiple assets while the latter applies one expression to each asset individually. Which means everything you can do with asset_expression you could accomplish with multiple calls to expression:

# one asset_expression
asset_expression={"green": "b1+2", "red": "b1+3"}

# two expressions
expression="green_b1+2"
expression="green_b1+3"

Applying multiple expressions like this would be much easier if ImageData.apply_expression did not mutate the ImageData instance but instead created a new ImageData instance with modified data. This way a user could do something like:

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8)
    first_expression = ima.apply_expression("green_b1+2")
    second_expression = ima.apply_expression("green_b1+2")
    third_exression = ima.apply_expression("green_b1+2/red_b1")

This example ☝️ got me thinking that maybe including expression in the COGReader.tile function call is overloaded now that it is an instance method on the object returned by that call 🤷

Sep 08 '22 15:09 geospatial-jeff

🙏 thanks @geospatial-jeff

I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.

+1

Can asset expressions be removed entirely if a user can pass assets to the expression

I think yes! +1

Applying multiple expressions like this would be much easier if ImageData.apply_expression did not mutate the ImageData instance but instead created a new ImageData instance with modified data.

Yes, just one thing: in your example you still need a way to tell the tile method which assets to read

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"))
    first_expression = ima.apply_expression("green_b1+2")
    second_expression = ima.apply_expression("green_b1+2")
    third_exression = ima.apply_expression("green_b1+2/red_b1")

This example ☝️ got me thinking that maybe including expression in the COGReader.tile function call is overloaded now that it is an instance method on the object returned by that call 🤷

Sadly we still need to pass the expression to tile so it parse it and know which asset to read

Sep 08 '22 19:09 vincentsarago

rio-tiler rio-tiler copied to clipboard

Band expression for MultiBaseReader

Proposal

potential issue

Status Quo ?

rio-tiler
rio-tiler copied to clipboard