scikit-gstat icon indicating copy to clipboard operation
scikit-gstat copied to clipboard

Support for Cross Variogram

Open digital-idiot opened this issue 3 years ago • 4 comments

Support for cross variogram would be great.

digital-idiot avatar Sep 15 '20 15:09 digital-idiot

Hey there, Yes, that would be great. Unfortunately, it would also be a considerable amount of work, that I cannot afford right now. The main reason is, that the Variogram class would need a second input for the co-variable. One of the very early design decisions I made, was to require only the coordinates and one array of values from the user. Breaking this up would mean to change the class all over the place. Thus, from my point of view, it would be easier to introduce a new class for Cross variograms. The other reason is, that the next issue will most likely ask for CoKriging, which would double the amount of work ;)

Nevertheless, implementing cross variograms for just one specific application should not be too complicated. If I can help, I will happily try to give some hints how to do that. You are also very welcome to implement and contribute a CrossVariogram class yourself, and I will happily give you some guidance, how Variogram works and at which places changes would be necessary. Sorry but best,

Mirko

mmaelicke avatar Sep 16 '20 05:09 mmaelicke

I would very much like to contribute. As you mentioned, a few pointers from you would be help me jump start, otherwise I have to read a huge chunk of code very carefully. Thanks for your help.

digital-idiot avatar Sep 16 '20 07:09 digital-idiot

Hey, Thanks, sounds great! I am not so sure anymore if it is really that much work. But first, I have to say, that I am not too familiar with cross-variograms. I read the 1998 paper by Cressie about it, but have never applied it so far anywhere.

So my suggestion would be to introduce a new class called CrossVariogram, inheriting from Variogram as a first step. As soon as we see the differences side-by-side, we might back-implement it into the existing Variogram class.

As far as I get it, we can assume, that instead of having a N-dimensional array of coordinates and a one dimensional array of observations (of same length), we have two one dimensional arrays (let's call them v and w) for the same coordinates, right? Basically, what we need to do is to change the way how the Z(v) - Z(v +h) part is calculated. All we need is to change this part to Z(v) - Z(w) for each combination of v,w.

These are the pairwise differences, stored in an array defined here: https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L158

The structure of this array is the same as for the distance, which is the row-wise upper triangle of the distance matrix. For _diff, it is just not the * distance* in space, but the difference in value.

You can obtain the content using this property: https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L259

These should both be fine. The array is filled by this function:

https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L1019

which further uses this function to yield the correct indexes: https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L1040 (btw. something that I always wanted to implement more properly... =))

I I am not mistaken, the __vdiff_indexer is also fine, as the the indices of the value are the same as long as the two observation arrays are obtained at the same points, in the same order. If not, we would have to change that. Not sure if that makes sense at all, just thinking loud.

So back into _calc_diff, there are two important lines: https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L1031

here, we basically need a v and a w, where w needs to be implemented (which is a bit of work, as self.values is a property, not an attribute). With that change in place, https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L1037

would be changed to something like:

for t, k in zip(self.__vdiff_indexer(), range(len(self._diff))):
            self._diff[k] = np.abs(v[t[0]] - w[t[1]])

Finally, we need to implement two value arrays, that hold the two covariables. The way they are handled can be found in the property values: https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L236

and its setter: https://github.com/mmaelicke/scikit-gstat/blob/0257d4ec21f5d54fb5f08092b48c964d4192be3a/skgstat/Variogram.py#L255

That's what we need twice. Just a short note. I would not implement the variable and the covariable as a 2d-array, as that is the input format I use in SpaceTimeVariograms to pass time series data. We should use another notion for cross-variograms.

Finally, if I didn't miss anything, it should be possible to implement cross variograms by introducing a cross_variable attribute and at the few places above check if that attribute is set.

If variable and co-variable can be observed at different locations (not sure if that makes sense), we would have to keep the new class and re-implement the distance matrix, to calculate all distances across both sets of coordinates.

mmaelicke avatar Sep 17 '20 07:09 mmaelicke

Hey there, @digital-idiot , just out of curiosity, is there any progress on this issue? Best, Mirko

mmaelicke avatar Dec 01 '20 19:12 mmaelicke

Thanks @mmaelicke for finally implementing this. I have been very busy with other things and it skipped my attention.

digital-idiot avatar Dec 02 '22 13:12 digital-idiot