Ribasim icon indicating copy to clipboard operation
Ribasim copied to clipboard

Support `storage` column in `Basin / profile`

Open visr opened this issue 7 months ago • 21 comments

Since #280 we define profiles with $A(h)$, integrating in the core to get the $S(h)$. For many reservoirs only the $S(h)$ is known, and asking users to convert to $A(h)$ is error-prone.

We can add a column storage such that users can fill this in instead of area.

Testing task Also, from Wilson test case, and get interpolated output (i.e., csv) of basin level, storage, area relationship (profile).

visr avatar Apr 24 '25 13:04 visr

For the moment the user should either define level-storage or level-area, where a validation error is given if both are defined, and an INFO message is given to warn a modeller when some basins are with level-area and other basins with level-storage (as this is error prone as well)

gijsber avatar Apr 24 '25 15:04 gijsber

Why is it error prone? Using info to warn users seems wrong.

Regarding the interpolation behavior, currently we linearly interpolate $A(h)$, and this leads to a piecewise quadratic $S(h)$.

See e.g. https://github.com/Deltares/Ribasim/blob/fdca41625dd62dcd417edac3bf320be8ae795cc9/core/src/read.jl#L968-L976 https://docs.sciml.ai/DataInterpolations/stable/inverting_integrals/ https://github.com/Deltares/Ribasim/issues/225#issuecomment-1554110892 And it is documented quite well: https://ribasim.org/reference/node/basin.html#profile

My question is, if I understand @JohnKucharski we should linearly interpolation a given $S(h)$, meaning that the derived $A(h)$ would be piecewise quadratic I think.

Image

This may start to get a bit more difficult if our interpolation method depends on which parameter is supplied. Of course if enough data points are given, it doesn't make a big difference. We could perhaps also decide that if a user supplies $h(A)$, we interpolate in a way such that we can linearly interpolate $S(h)$.

visr avatar Apr 25 '25 10:04 visr

Hi thanks for creating this issue Martijn. Just to know if I am getting all this:

Today in Ribasim, if you only know the 𝑆(ℎ) relation, you must manually compute 𝐴(ℎ) (by differentiating). This can go wrong especially if I (the user)

  • have a few data points (sparse h-S(h))
  • The points are noisy
  • The geometry of the reservoir is not simple (non-smooth storage curve)

Writing this up for me as an example in how manual derivation can go wrong (please correct me if I'm wrong) Suppose someone measures the following water levels and storages:

Level (h) Storage (S)
0 0
1 1.2
2 4.1
3 9.3
4 16.8

Now, to find 𝐴(ℎ) manually, I'd compute finite differences:

Between ℎ=0 and ℎ=1 𝐴(0.5)≈1.2−0/1−0=1.2 Getting:

h A(h)
0.5 1.2
1.5 2.9
2.5 5.2
3.5 7.5

Meaning (problems):

  • I now have area values at "midpoints" (0.5, 1.5, ensoforth) not exactly where I need them (0, 1, 2, etc.).
  • Interpolates again if Ribasim expects area at certain levels. introduces more assumptions
  • The accuracy depends on how closely spaced my levels are

But in real life:

  • Reservoir surveys are expensive, so data is often sparse.
  • Measurements are noisy.
  • Geometries can have weird features (plateaus, sudden expansions). So usually not simple curves.

Therefore, even with reasonable number of points, there's a real risk of getting a wrong A(h). But relying on users to do this carefully is risky, so this is why Ribasim should ideally support using h-S(h) directly.

Fati-Mon avatar Apr 25 '25 12:04 Fati-Mon

Yeah I think that's right. Note that $A(0.5) = 1.2$ can be correct if the reservoir walls are vertical between 0 and 1, but also if there is a slope. When we supply area we need to give the flat area of the bottom as well.

visr avatar Apr 25 '25 13:04 visr

@SouthEndMusic and @JohnRushKucharski will discuss the preferred option from https://github.com/Deltares/Ribasim/issues/2254#issuecomment-2829966042. I believe John indicated interpolating linearly between storages was preferred, but Bart pointed out that for the solver the piecewise quadratic interpolation that results from linearly interpolating areas will be more smooth, as we avoid discontinuities in the derivative at the data points.

visr avatar Apr 29 '25 08:04 visr

My question is, if I understand @JohnKucharski we should linearly interpolation a given $S(h)$ meaning that the derived $S(h)$ would be piecewise quadratic I think.

That is not true. The $A(h)$ is the derivative of $S(h)$, so when $S(h)$ is piecewise linear, $A(h)$ is piecewise constant (that is, if you want to have a mathematically consistent Basin profile description). The solver won't like that for evaporation.

My idea is to always use the interpolation that we have now, and so if $S(h)$ is given, we interpolate that piecewise quadratically:

Say we have data points $(h_i, s_i)$ for $i = 1, \ldots N$ where $s_i$ and $h_i$ are strictly increasing and $s_1 = 0$. We want to interpolate this data piecewise quadratically and $C^1$ smooth;

$$ f_i(h) = a_i(h - h_i)^2 + b_i(h-h_i) + s_i, \quad h_i \le h \le h_{i+1}, \quad i = 1, \ldots, N-1. $$

By this definition already $f(h_i) = s_i$ for all $i = 1, \ldots, N-1$. For the right value at the end point of each interval we have, with $\Delta h_i = h_{i+1} - h_i$, $\Delta s_i = s_{i+1} - s_i$:

$$ f_i(h_{i+1}) = s_{i+1} \quad \Rightarrow \quad a_i\Delta h_i^2 + b_i\Delta h_i = \Delta s_i, \quad \quad i = 1, \ldots, N-1. $$

Lastly, for differentiability in the data points we have

$$ \frac{\text{d}f_i}{\text{d}h}(h_{i+1}) = \frac{\text{d}f_{i+1}}{\text{d}h}(h_{i+1}) \quad \Rightarrow \quad 2a_i\Delta h_i + b_i = b_{i+1}, \quad \quad i = 1, \ldots, N-2. $$

Note that we have $2(N-1)$ unknowns, and $2(N-1)-1$ equations. This means that we have a degree of freedom left, which we can get rid of by assuming $a_1 = 0$, i.e. $f_1$ is linear. Now we can solve a linear system for the unknowns $a_i$ and $b_i$ (which can actually be solved by a simple loop), and the derivative yields a piecewise linear continuous $A(h)$ (constant in the first interval).

SouthEndMusic avatar Apr 29 '25 09:04 SouthEndMusic

Sounds good to me. We also need the first entry in the profile to give us the bottom elevation and area. We can validate that the first entry contains S = 0. What do we assume for the area at S = 0? We currently require this to be larger than 0. Should we assume vertical walls in the first segment?

visr avatar Apr 29 '25 11:04 visr

With this method the area at the bottom is given by finite difference:

$$ A_1 = \frac{s_2 - s_1}{h_2 - h_1} $$

with the assumption at the bottom of my previous post indeed the walls are vertical from $h_1$ to $h_2$ with this area.

SouthEndMusic avatar Apr 29 '25 11:04 SouthEndMusic

Hmm this doesn't work quite yet because having positive areas everywhere isn't guaranteed.

SouthEndMusic avatar Apr 29 '25 13:04 SouthEndMusic

I just want to add that evaporation will never be a large outflow. Piecewise constant is therefore perhaps not terrible, if that will otherwise simplify things.

visr avatar Apr 29 '25 14:04 visr

Sorry, I'm joining this discussion late.

From Peter's comment:

For the moment the user should either define level-storage or level-area, where a validation error is given if both are defined, and an INFO message is given to warn a modeller when some basins are with level-area and other basins with level-storage (as this is error prone as well)

  1. For a single node, if possible a user should be allowed to provide the level-storage-area relationship as a set of tuples, i.e. {$(h_0, S(h_0), A(h_0)), (h_1, S(h_1), A(h_1)), ..., (h_n, S(h_n), A(h_n))$}. The reason is that both the S(h) and A(h) relationships may be very irregular over the domain of h. If a user plans to use both S(h) and A(h), for example S(h) to define reservoir control rules, and A(h) to compute evaporation, then I think we would want them to add additional tuple points at wherever a kink in the S(h) or A(h) relationship occurs.

  2. For multiple nodes, if some relationships are given only in terms of S(h) and others are given only in terms of A(h), then it seems likely that the program may make assumptions about the other relationship, i.e., S(h) if only A(h) is given that are reasonable but untrue and may produce unexpected results that could be difficult for a user to diagnose, particularly I think as $h \to 0$.

ghost avatar Apr 30 '25 07:04 ghost

Regarding the discussion about interpolation. My main thought is given a set of points: {$(h_0, S(h_0), A(h_0)), (h_1, S(h_1), A(h_1)), ..., (h_n, S(h_n), A(h_n))$}, or some variant (like where S(h) is defined but A(h) is not) we want to make a minimum number of simple assumptions about the shape of S(h) or A(h) between any two of the provided adjacent points (i.e. k, k+1). This is because a good reason to include the points, k and k+1 is that the relationship between h and S(h) or A(h) changes or could even be discontinuous around k or k+1 (i.e., imagine a reservoir with a trapezoidal bottom that becomes rectangular at the top).

That said, if piecewise linear causes problems in the code, then I think some smoothing is OK. I would try to perform that smoothing over as small a domain in the vicinity of k or k+1 as possible.

It is something we would want to document too, because I think a user's intuition would be linear interpolation between points - unless we tell them otherwise. Also, if we deviate from a linear shape for much of the space between k and k+1 then they would want to modify the set of points they provide.

ghost avatar Apr 30 '25 07:04 ghost

@JohnRushKucharski-Deltares I'd like to refer you to our earlier discussion here: https://github.com/Deltares/Ribasim/issues/225. We used to have level, area and storage input before that, but I brought up the point that when the A(h) relationship is known, the S(h) relationship follows from that by integration, and the other way around by differentiation. Having them both as piecewise linear interpolations is mathematically inconsistent.

If a user plans to use both S(h) and A(h)

This is not an active decision a modeler makes, S(h) is always needed for basins and A(h) for evaporation (the latter being optional).

To give a bit more insight into what is happening in the code now: A(h) is a linear interpolation, and what we often have to do is to obtain the level from the storage, which means solving an integral equation. That is described here: https://docs.sciml.ai/DataInterpolations/stable/inverting_integrals.

Do I understand correctly from your posts that you would like it to be possible to define a basin profile by providing some area and some storage data, not necessarily at the same levels?

that the relationship between h and S(h) or A(h) changes or could even be discontinuous around k or k+1 (i.e., imagine a reservoir with a trapezoidal bottom that becomes rectangular at the top).

At the moment we don't support any discontinuities in the profile. We put quite some effort into making all relationships as smooth as reasonably possible, to increase the performance of the Newton solve for the implicit solver we use by default. That is also the reason that I advocate to keep the piecewise quadratic S(h) relationship.

SouthEndMusic avatar Apr 30 '25 08:04 SouthEndMusic

Hi @SouthEndMusic , I think the key is: S(h) and A(h) should be treated as independent relations: S(h) for the storage, A(h) for the evaporation flux

whether both S(h) and A(h) are provided whether S(h) is given and A9h) is derived whether A9h) is given and S(h) is derived

gijsber avatar Apr 30 '25 08:04 gijsber

Sorry if my comment was confusing...

@SouthEndMusic I agree with @gijsber's comment, S(h) and A(h) should be treated as independent relationships.

Regarding this question:

Do I understand correctly from your posts that you would like it to be possible to define a basin profile by providing some area and some storage data, not necessarily at the same levels?

I was imagining that the user provides only one of the following:

  1. a single set of (h, S(h)) relationships,
  2. a single set of (h, A(h)) relationships, or
  3. a single set of (h, S(h), A(h)) relationships

I wasn't imagining a case where they provide sets 1 and 2 for the same basin, if that was your question.

Regarding this part of your comment:

At the moment we don't support any discontinuities in the profile. We put quite some effort into making all relationships as smooth as reasonably possible, to increase the performance of the Newton solve for the implicit solver we use by default. That is also the reason that I advocate to keep the piecewise quadratic S(h) relationship.

I wasn't arguing in opposition to using the quadratic relationships you've suggested.

My main point was that when selecting a set of h, f(h) points to provide [where f(h) is S(h) or A(h)] a user's instinct will be to:

  • add few points in regions where the real-world f(h) relationship is approximately linear, and
  • add many points where the real-world f(h) relationship is non-linear or discontinuous.

So, if we are going to make the f(h) relationship non-linear between the provided points (to smooth the function), we'll need to either: (a) document this and give the user specific guidelines about how to choose the set of h, f(h) points they provide. Or (b) do something to limit the range over which we smooth the function. Does that make sense?

ghost avatar Apr 30 '25 09:04 ghost

If we view A(h) and S(h) as independent relationships (which they geometrically aren't, but I'll go along with that reluctantly if I have to), then S(h) has to be provided always (for output, translating initial levels to storages, translating storages to levels internally), and A(h) when evaporation data is available. We can then interpolate A(h) linearly, and rethink how we do h(S) interpolation. I actually worked on a linear interpolation with smoothed transitions earlier: https://southendmusic.github.io/CachedInterpolations.jl/stable/examples/. We can use this by defining the h(S) interpolation directly from the data instead of inverting the S(h) interpolation, which is much more expensive.

SouthEndMusic avatar Apr 30 '25 10:04 SouthEndMusic

@SouthEndMusic, that sounds like a good plan to me. One minor point on invertability, because it came up in our discussion. We can safely assume $S^{-1}(h)$ exists as a function because S(h) is strictly increasing, but A(h) is only monotonic non-decreasing, i.e. more than one h can map to the same A(h) value (i.e. think of a reservoir (or any container) with vertical sides) so A(h) may not be invertable.

ghost avatar Apr 30 '25 10:04 ghost

A(h) not being invertible is fine, I cannot think of a reason why you would need that

SouthEndMusic avatar Apr 30 '25 10:04 SouthEndMusic

At the risk of "yet another opinion": this doesn't have to be solved in the Julia kernel, right?

The essential issue as posted by @visr, to me, seems:

asking users to convert to A(h) is error-prone.

Hence, if we were to provide a reasonable (Python) pre-processing function, it would cease to be error prone. It also makes the result explicit and legible: the A vs h table that you provide as Ribasim input. We also need not discuss about the optimal form: the user can choose via some option. A(h) remains continuous as a bonus.

Separating A(h) and S(h) sets up a footgun and will almost certainly cost someone dearly in the future; a user will not read all the fine print, and, reasonably, assume that they are obviously geometrically related. They will run a scenario and update one parameter, but not the other, find out weeks later and will have to re-run all their scenarios (in the weekend) -- this is always how it goes with modeling.

I think a piecewise S(h) relationship seems attractive because it feels the simplest: "I have storage values and level values, just draw a line through them", but it's a conceptual trap. Very few hydrological systems are purely a sequence of plateaus, in terms of elevation. Sure, there might be a plateau or sudden expansion as @Fati-Mon mentions, but these can be approximated with a small fictitious slope (again, this can be an option in a pre-processing function, although the real plateau probably still has a small real physical slope). Ask a hydrologist to draw a storage curve, and they will base it on some elevation profile; i.e. a piecewise linear A(h). Of course, this necessarily implies a quadratic S(h)!

Relatedly, this is an interesting remark:

For many reservoirs only the S(h) is known

How is it known? Probably through observation of h and A in combination with bathymetry? There is likely a coherent conceptual model that's used to go from A -> S, and that should allow going the other way as well.

Huite avatar Apr 30 '25 13:04 Huite

@Huite, when thinking in terms of geometry you prioritize physical consistency and smooth functions. From a hydrologist's perspective, indeed, you prefer A(h) → S(h) since it aligns with how topographic surveys are processed. I completely agree.

However, in practice a reservoir operator usually work from measured data: Water level (h) from a gauge and Storage (S) from for example volume tracking during filling/drawdown (Bathymetric surveys are costly and time-consuming, especially for deep or sediment-laden reservoirs). They often do not have access to full bathymetry or detailed A(h) relationships, especially in older reservoirs, or in regions without detailed bathymetric surveys. (think about many irrigation reservoirs in sub-Saharan Africa, such as in Morocco they have never had detailed bathymetric surveys).

Even if the original design had A(h), the reservoir may have changed over time (sedimentation, reconstruction), making it unreliable. Also the operator needs fast operational estimates or rules (if h = 10 m → storage = 2.3 million m³). Thus, prefers S(h) as primary input. @JohnRushKucharski-Deltares as a reservoir modeler please correct me if I am wrong here.

So why piecewise linear S(h) isn't necessary a "conceptual trap", well first it matches how data is collected and used operationally. Second, it’s intuitive and often sufficiently accurate, especially when interpolation is over short ranges or many points are available. It avoids forcing users (like me at the moment, very frustrating I might add :p) to convert to area profiles or derive uncertain A(h). So no, reservoir operators don’t necessary think like hydrologists, and software like Ribasim that is indented for many stakeholders should respect that difference. Supporting S(h) directly isn’t just about convenience, it’s about aligning with users’ mental models and the kind of data they actually have.

Fati-Mon avatar May 07 '25 06:05 Fati-Mon

@Fati-Mon

You're right of course, you can measure S-h directly with e.g. a hydropower dam, I hadn't considered that yet!

I'll argue there still lies a trap in waiting, since a reservoir model needs to some measure of area to estimate evaporation. So necessarily, a S-A or h-A is implicitly required. How is the water surface area typically estimated in reservoir modeling? As a (piecewise) constant? If you take aerial photos at N times and measure N water levels at those times, and then linearly interpolate the area, the result is once again a piecewise cubic all the same.

often sufficiently accurate, especially when interpolation is over short ranges or many points are available.

Indeed, but vice versa, a piecewise cubic is close enough as well?

My suggestion remains to first solve this with a Python function pre-processing function:

def basin_profile_from_h_S(h, S, area_method: Literal["linear", "piecewiseconstant"]):
    """
    h: water level [m]
    S: storage [m^3]
    area_method: "linear" or "constant"
          Whether to generate a piecewise linear area values (implies piecewise quadratic storage values),
          or whether to generate piecewise constant area values (implies linearly interpolated storage values).
   """
    ...

You might need more options for what to do at the bottom, etc.

Then for debugging / introspection:

def plot_basin_profile(...):
     # Make sure to add logic to plot the piecewise quadratic storage curve

There are multiple benefits to making a pre-processing function:

  • No breaking changes to the model input are required.
  • Easy experimentation with different schemes (e.g. bit far fetched, but as an example: add option to use bathymetry information).
  • The user is forced to make a conceptual choice and is thereby made aware of a potential issue (again, as you mention, linear interpolation of storage seems intuitive, but it's geometrically incoherent when coupled to a linearly interpolated area values).
  • Argubably, most important: the Julia core remain simple and coherent.

I think it's worthwhile to be principled about this. At Deltares, we have plenty of experience with numerical cores "trying to be helpful" by doing all kinds of things: ignoring and fixing inconsistencies, automatically resampling data (in time and space), many implicit defaults, etc. The end result is a model which is easy to get running, but it's also complicated and untransparent -- you're never sure what it's doing behind the scenes. My view is that such conveniences should be moved to preprocessing if possible. The model input is accesible to everyone; the model internal state is only accessible to developers (with a debugger or print statements).

If the Python preprocessing functionality is a big succes, we can always decide to move it into the Julia core. But there are surely lessons along the way!

Huite avatar May 07 '25 15:05 Huite