iris
iris copied to clipboard
Extracting a time range from a cube is slow
📰 Custom Issue
Extracting a time range as described in the documentation is quite slow if you want to do it for many cubes and/or cubes with many time points. For a single cube with 10000 time points it already takes 2 seconds on my computer, so if I want to subset a few hundred cubes that becomes quite slow.
Here is a script that demonstrates this:
import cf_units
import iris.cube
import iris.coords
import iris.time
import numpy as np
time_units = cf_units.Unit('days since 1850-01-01', calendar='standard')
time = iris.coords.DimCoord(np.arange(10000, dtype=np.float64), standard_name='time', units=time_units)
cube = iris.cube.Cube(np.arange(10000, dtype=np.float32))
cube.add_dim_coord(time, 0)
pdt1 = iris.time.PartialDateTime(year=1852)
pdt2 = iris.time.PartialDateTime(year=1854)
constraint = iris.Constraint(time=lambda cell: pdt1 <= cell.point < pdt2)
%timeit cube.extract(constraint)
Result:
1.83 s ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
From looking at the code in iris.coords
, it looks like the slow behaviour is caused by converting all time points to datetimes individually for each cell, instead of converting them once and then generating the cells.
Here is some code with timings:
%timeit time.units.num2date(time.points)
27.3 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
and
%timeit list(time.units.num2date(p) for p in time.points)
1.53 s ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If this is an interesting feature, I can make a pull request to change the code so it first converts all the time points and then generates the cells?
This was previously raised at #3609, which went stale. So I think this is a desirable feature that no-one got around to addressing yet.
Fancy taking it on @rcomer ? :wink:
I think @bouweandela was offering to put something up, and has clearly already given it more thought than I have!
Yes, I already tried to implement something. I'll open a pull request and we can see from there..
Just opened a pull request here: https://github.com/SciTools/iris/pull/4969