flopy
flopy copied to clipboard
interpolating z-data along line to grid
I was wondering whether interpolating z-data along a line to a modelgrid is something that could be included in flopy (gridintersect) utils? I saw there was also a stackoverflow post asking a similar question.
I've written a bit of code that takes a linestring and an array containing the points at which z-data is defined and calculates the z-values for grid cells that the linestring intersects. I'm using the GridIntersect code to get the intersection and shapely for projecting points onto the linestring. Plotting the final result looks something like this:

If this is something that you'd like to see added, I can submit a PR. And if so, would it make sense to add this functionality to the GridIntersect toolset?
@dbrakenhoff, this is clearly a useful contribution and something that would be great to have in flopy. I have a couple of quick questions for discussion in order to think about how this fits into a general capability.
- I suppose there are a couple of different ways this could be done. Is the line broken up into individual line segments, where the individual line segments fall completely within the cell? (So for your example above, is that LineString divided into 27 individual line segments?) And if so, do you then interpolate a value to the midpoint of the individual line segments?
- Can you show us what the syntax might look like for this operation and the output that is returned?
- Is there corresponding behavior that could be implemented for points and polygons?
- If we ultimately want to build a Stream Flow Routing network (or something else that requires connectivity), we need to know how the individual line segments connect to one another (what is the ID of the previous line and the ID of the next line).
One other thing:
- What happens if the line goes through the cell, out of the cell, and then back into the cell? I'm guessing they are listed in the output as separate individual line segments with different interpolated values? Or maybe there is a flag to aggregate?
Good questions! There's quite a few cases to consider here.
The current implementation I wrote works like this:
- Get the x- and y-cellcenters of the gridcells that intersect with the linestring.
- Calculate the distance along the linestring for each cellcenter by projecting the cell centers onto the linestring (i.e. get the point on the linestring nearest to the cell center).
- Do the same for the points at which z-data is defined. Now everything is defined as a distance along the linestring.
- Calculate the interpolation weights for each of the projected cell-centers (= the distances to the surrounding points with defined z-values).
- Multiply the weights by the z-values to get the interpolated z-value for each cell-center.
So to answer your questions:
- I suppose there are a couple of different ways this could be done. Is the line broken up into individual line segments, where the individual line segments fall completely within the cell? (So for your example above, is that LineString divided into 27 individual line segments?) And if so, do you then interpolate a value to the midpoint of the individual line segments?
In my case right now the point on the linestring nearest to the cell center determines its value. So i guess a sort of method="nearest" implementation. And it only supports single linestrings, so no branching, or complicated networks.
- Can you show us what the syntax might look like for this operation and the output that is returned?
ls = shapely.geometry.Linestring(...) # linestring
ix = GridIntersect(modelgrid) # for some MODFLOW modelgrid
xyz = np.array([[95, 105, 10],
[30, 50, 20],
[0, 0, 0]]) # pts with z-values (x, y, z)
zi = ix.interpolate_z_along_linestring(ls, xyz)
#-> returns a a rec-array (same as intersection result) with cell IDs and interpolated z-values
- Is there corresponding behavior that could be implemented for points and polygons?
- This implementation is for points with defined z-values on (or near) a simple line.
- If each line segment has a z-value that holds for the entire segment (or a value at the start and end of each segment), you'd need a slightly different approach. But the approach depends on how that data is stored. If you have a shapefile with attributes, you're better off looping through segments, doing the intersections with the grid and storing the attributes you want to keep with the intersection result. If the z-values are stored in the shapely geometry, then another approach is probably needed as these are not kept after doing intersections I believe...
- For more complicated linestrings with a z-value per line segment you're maybe better off calculating a weighted z-value based on the length of all the different line segments in that grid cell.
- For polygons, I think you'd collect all the intersection results, store the z-attributes, groupby cell-ids, and do some aggregation (i.e. area-weighted).
- If we ultimately want to build a Stream Flow Routing network (or something else that requires connectivity), we need to know how the individual line segments connect to one another (what is the ID of the previous line and the ID of the next line).
This information is not stored by the intersection result but it should be possible to retrieve this information based on the resulting geometries or the stored vertices... but this could get quite challenging for large (branching) networks?
- What happens if the line goes through the cell, out of the cell, and then back into the cell? I'm guessing they are listed in the output as separate individual line segments with different interpolated values? Or maybe there is a flag to aggregate?
Right now the point nearest to the cell center determines the cell's value. (In the intersection output all of the segments in a grid cell are stored together.)
I made a notebook showcasing the code I wrote, and included some different cases. Perhaps this is a useful starting point for figuring out which operations we can include in flopy and which are best left to the user to implement.
Thanks @dbrakenhoff for putting this together. It would be good to get some feedback from @aleaf, @mnfienen-usgs, and others who are working on similar capabilities. Any thoughts?
I think this still needs some more thought, so any feedback is welcome! And otherwise, I'll think about it some more and submit a PR (after the next flopy release probably).
Thanks @dbrakenhoff for putting this together. It would be good to get some feedback from @aleaf, @mnfienen-usgs, and others who are working on similar capabilities. Any thoughts?
FYI, we've been working on something similar at Aquaveo. For now all we are interested in is the parameterized distance (between 0.0 and 1.0) from the start of the linestring to each intersection point. We call these "t_values" (naming is hard) and there is one t_value per intersection vertex returned in the GridIntersect results. From the t_values we calculate values for the cells, like river stage and bottom elevation, by interpolating the data stored at the ends of the linestring.
We got it working using some geometric functions and not using shapely. I'd be happy to share the code if anyone is interested.
Sorry, must have missed this thread the first time around. Off the top of my head, I can't think of an instance where I've had to do exactly what is being described (if I'm understanding it right). For sampling feature data to a model grid, I've often used the rtree package, which provides a fast way of determining bounding box intersections for large numbers of features, and then shapely to get the intersections of the actual features. Although I think this can be done more easily at the same speed in geopandas now with the sjoin method. For reconstructing the order of individual line fragments representing intersections with model grid cells, the general procedure in SFRmaker is
- For each original line, get a list of intersected grid cells, and the corresponding line fragments using rtree and shapely
- identify the start and end points of the original line
- Find the line fragment with the starting point closest to the start of the original line (they should coincide). Identify the end point and make that the new start point. Add the line fragment and the corresponding grid cell number to a list.
- Repeat step 3 until there are no more fragments
The reason I opened this issue was because of the following cases I ran into:
- I have a shapefile describing a river (linestring or polygon) and several waterlevel measurement locations at certain points along this river. I want to to set my river stage based on the interpolation between these waterlevel observations.
- Another example would be with a shapefile of a river (linestring) with attributes that define the bottom elevation at the start and endpoints.
It seemed logical to me to extend gridintersect to allow calculation of these interpolated values. So instead of returning a rec-array with intersection results (i.e. length, area, depending on the shape), you can also get a rec-array with interpolated values where the feature intersects the grid.
But there might be other more logical ways to go about getting this data into a modelgrid?
We got it working using some geometric functions and not using shapely. I'd be happy to share the code if anyone is interested. @mkennard-aquaveo I'd be interested to see your code! It sounds like a simple and elegant method.
So instead of returning a rec-array with intersection results (i.e. length, area, depending on the shape), you can also get a rec-array with interpolated values where the feature intersects the grid.
The "t_values" approach I mentioned gives you the interpolated values where the feature intersects the grid, just scaled between 0.0 and 1.0. Thus they are still pure geometry like the rest of the GridIntersect results but they can be reused to interpolate whatever attributes are stored at the endpoints, like river bottom elevation, river stage etc. It seems like a simple approach as these could just be added to the recarray containing length, vertices etc. and GridIntersect wouldn't have to deal with the MODFLOW attributes.
For example, to compute the river stage at a cell, we interpolate like this: stage_cell = stage0 + (stage1 - stage0) * ((t_value_cell_first + t_value_cell_last) * 0.5) where stage0 and stage1 are the stages at the beginning and end of the line, respectively, and t_value_cell_first and t_value_cell_last are the first and last t_values for the cell. With real numbers, it might look like 9.125 = 10.0 + (9.0 - 10.0) * ((1.0 + .75) * 0.5).
@dbrakenhoff what is the status of this issue?
Still WIP, moving it to the next release. I'll take another look somewhere this month!