iris icon indicating copy to clipboard operation
iris copied to clipboard

Disentangling the order in which constraints and callback are applied

Open SimonPeatman opened this issue 4 years ago • 8 comments

📰 Custom Issue

I tried to run the following line of code and was surprised to find myself running into problems:

u_cubes = iris.load(filenames, domain & 'eastward_wind', callback=cb)

Here, domain is an iris.Constraint object and cb is a valid callback function. The callback function relies on the correct domain already having been extracted from the Cube, but this didn't happen! Therefore, I assume the callback function is applied first, then the constraint afterwards. However, only the 'eastward_wind' data were loaded from the file (other variables were also present) so in fact it seems to apply the string constraint first, then the callback function, then the Constraint object. I can work around this by putting cube = cube.extract(domain) at the top of the callback function, but Iris' behaviour seems counter-intuitive here.

  1. I couldn't see anything in the documentation about this, but I think this behaviour needs to be documented clearly.
  2. Is there any particular reason for the constraints and callback being applied in this order?

SimonPeatman avatar Jan 06 '21 13:01 SimonPeatman

@SimonPeatman Thanks for raising this issue, much appreciated.

First of all, I think the documentation should give clarity on the order of operations performed in the loading pipeline. That is clearly lacking, which hasn't helped you, so I'd like to remedy that for you and the community.

Also, just to confirm, constraints are applied first then the callback is applied afterwards, always.

Would it be possible for you to share with me here in this issue your domain constraint and your callback code?

This would really help me to investigate further. Thanks 👍

bjlittle avatar Feb 21 '21 21:02 bjlittle

Hi @bjlittle, below is a minimal working example. I've attached the two data files I'm using. Just to let you know, I'm running this with Iris 3.0.1.

import iris

# When applied, this constraint should subset the data from shape
# (58, 128) to shape (58, 86)
domain = iris.Constraint(longitude=lambda xx: 80 <= xx <= 140)

# Read in data to be subtracted, applying the domain constraint
cube_to_sub = iris.load_cube('u_data_to_subtract.nc', domain)
assert cube_to_sub.shape == (58, 86)

def cb1(c, f, n):
    """Callback function which applies the constraint
    then subtracts other data
    """
    c = c.extract(domain)
    c.data -= cube_to_sub.data
    return c

def cb2(c, f, n):
    """Callback function which subtracts other data,
    assuming present cube is already the right shape
    """
    c.data -= cube_to_sub.data
    return c

# The following works, because the c.extract() inside the callback
# function ensures the two cubes have the same shape
cube1 = iris.load_cube('u_and_v_data.nc', 'eastward_wind', callback=cb1)
assert cube1.shape == (58, 86)

# If the constraints are applied before the callback function then this
# should also work, because this cube should already have the correct
# shape before the callback function is called.
# However, it crashes with ValueError: operands could not be broadcast
# together with shapes (58,128) (58,86) (58,128)
cube2 = iris.load_cube('u_and_v_data.nc', domain & 'eastward_wind',
                       callback=cb2)

netcdf_files.zip

SimonPeatman avatar Feb 23 '21 19:02 SimonPeatman

Related discussion: #4185

rcomer avatar Jun 09 '21 21:06 rcomer

Bumping this thread because this issue has come up again for me this morning. Neither this thread nor #4185 has managed to resolve the question of which order the constraints and callback are applied. It would make more logical sense for the constraints to be applied first but here is an example where it seems to be the other way round:

def cb(c, f, n):
    c.coord('height_above_sea_level').convert_units('km')

# This file contains multiple variables
# I want to read in only one of these, called 'my_variable'
# This variable has a 'height_above_sea_level' AuxCoord
# The other variables do not
cube = iris.load_cube('filename.nc', 'my_variable', callback=cb)

This crashes with CoordinateNotFoundError: 'Expected to find exactly 1 height_above_sea_level coordinate, but found none.' so it must be applying the callback to all the variables in the file, not just the one I wanted.

SimonPeatman avatar Sep 21 '21 10:09 SimonPeatman

It would make more logical sense for the constraints to be applied first

Actually no, because often you may want to apply constraints on metadata which was added by a callback. So, a callback is applied to all 'raw cubes' loaded (before a merge step), and constraints are applied last.

So, in this case, you can't stop Iris initially loading all the data : the "my_variable" performs a post-filtering of the cubes already created, and in fact it must do so because it needs a cube on which to perform its selection test.

The callback needs to be re-written so it can avoid the problem. Maybe just ...

def cb(c, f, n):
    for height in c.coords('height_above_sea_level'):
       height.convert_units('km')

(though in practice, you might want to ensure there are only 0 or 1 height coords)

However, as explained on #4185, there are also some specific exceptions to this, where constraints of a particular limited type are "translated" into a filtering process in the low-level loading, purely for performance reasons.
It is admitted that this part has not been clearly explained anywhere, so far !!!

pp-mo avatar Sep 21 '21 11:09 pp-mo

The latest intent is the write up the current logic, possibly in a Technical Paper, after which it might be possible to have further discussion.

trexfeathers avatar Jan 18 '23 10:01 trexfeathers