rio-tiler
rio-tiler copied to clipboard
Band expression for MultiBaseReader
ref https://github.com/cogeotiff/rio-tiler/pull/523#discussion_r963304743
Right now when passing expression
to MultiBaseReader (STAC) methods, we assume each asset is one band e.g:
with STACReader("stac.json") as stac:
ima = stac.tile(701, 102, 8, expression="green/red")
print(ima.count)
>>> 1 # it returns a 1 band image
if one asset has more than one band and if no other option is set, the result could be unexpected.
The way expression
works is by using
def apply_expression(
blocks: Sequence[str], bands: Sequence[Union[str, int]], data: numpy.ndarray,
) -> numpy.ndarray:
return numpy.array(
[
numpy.nan_to_num(
numexpr.evaluate(bloc.strip(), local_dict=dict(zip(bands, data)))
)
for bloc in blocks
if bloc
]
)
where we assign a name
to each band within the numpy array
data = numpy.zeros((3, 256, 256))
list(zip(["b1", "b2"], data))
>>> [('b1',
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])),
('b2',
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]))]
In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr
. We can see that no errors is raised while we have 3 bands and only 2 names.
Proposal
with #523 we now have bands names all the way during the processes, e.g
with STACReader("stac.json") as stac:
ima = stac.tile(701, 102, 8, assets=("green", "red"))
print(ima.band_names)
>>> ["green_b1", "red_b1"]
It means that in theory expression could be in form of expression="green_b1/red_b1"
with STACReader("stac.json") as stac:
ima = stac.tile(701, 102, 8, expression="green_b1/red_b1")
print(ima.band_names)
>>> ["green_b1/red_b1"]
internally rio-tiler will know that for expression "green_b1/red_b1"
we want to read assets green
and red
. Then we will have an image with band_names="green_b1", "red_b1"
. Those names will be used in the apply_expresion
directly because apply_expression is a new method of the ImageData class
https://github.com/cogeotiff/rio-tiler/blob/55ae89749204de11eef26e06c962d9744241f132/rio_tiler/models.py#L278-L283
potential issue
Another question is what happens when we use asset_expression
?
with STACReader("stac.json") as stac:
ima = stac.tile(701, 102, 8, assets=("green", "red"), asset_expression={"green": "b1+2"})
print(ima.band_names)
>>> ["green_b1+2", "red_b1"]
☝️ we have then a band name that is "green_b1+2"
so if we wanted to use expression we would have to write
expression= "green_b1+2/red_b1"
but that won't work because rio-tiler will try to parse the main expression and re-apply! +2
to green_b1+2
!!!
One solution would be to use _b1
as the first band of the asset even if there is expression applied.
Status Quo ?
While I feel ☝️ is better, I understand that it's a big breaking change.
In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr. We can see that no errors is raised while we have 3 bands and only 2 names.
I've added a check
in https://github.com/cogeotiff/rio-tiler/pull/523/commits/d883df00f134aed83ad5c1097e4952ff827b812e to make sure the number of bands and data shape match
It means that in theory expression could be in form of expression="green_b1/red_b1"
I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.
One solution would be to use _b1 as the first band of the asset even if there is expression applied.
I think appending _b1
is a good way to normalize applying of an expression to single and multi-band images.
Another question is what happens when we use asset_expression?
Can asset expressions be removed entirely if a user can pass assets to the expression
? It seems to me the difference between expression
and asset_expression
is the former may be used to combine data from multiple assets while the latter applies one expression to each asset individually. Which means everything you can do with asset_expression
you could accomplish with multiple calls to expression
:
# one asset_expression
asset_expression={"green": "b1+2", "red": "b1+3"}
# two expressions
expression="green_b1+2"
expression="green_b1+3"
Applying multiple expressions like this would be much easier if ImageData.apply_expression
did not mutate the ImageData
instance but instead created a new ImageData
instance with modified data. This way a user could do something like:
with STACReader("stac.json") as stac:
ima = stac.tile(701, 102, 8)
first_expression = ima.apply_expression("green_b1+2")
second_expression = ima.apply_expression("green_b1+2")
third_exression = ima.apply_expression("green_b1+2/red_b1")
This example ☝️ got me thinking that maybe including expression
in the COGReader.tile
function call is overloaded now that it is an instance method on the object returned by that call 🤷
🙏 thanks @geospatial-jeff
I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.
+1
Can asset expressions be removed entirely if a user can pass assets to the expression
I think yes! +1
Applying multiple expressions like this would be much easier if ImageData.apply_expression did not mutate the ImageData instance but instead created a new ImageData instance with modified data.
Yes, just one thing: in your example you still need a way to tell the tile
method which assets to read
with STACReader("stac.json") as stac:
ima = stac.tile(701, 102, 8, assets=("green", "red"))
first_expression = ima.apply_expression("green_b1+2")
second_expression = ima.apply_expression("green_b1+2")
third_exression = ima.apply_expression("green_b1+2/red_b1")
This example ☝️ got me thinking that maybe including expression in the COGReader.tile function call is overloaded now that it is an instance method on the object returned by that call 🤷
Sadly we still need to pass the expression to tile
so it parse it and know which asset to read