DIVAnd.jl
DIVAnd.jl copied to clipboard
"gradient" effect on DIVAnd maps[🐞]
Describe the bug
** diva3d call **
@time info = diva3d( (lonr,latr,depthr,TS), (obslon,obslat,obsdepth,obstime), obsval, len, epsilon2, # epsilon2 = 1 (profile) or 10 (timeseries) outputfile[s], varname, bathname = bathfile, mask = mask, background = bkgd, #seasonal profile #plotres = plotres, transform = Anam.loglin(maxval), ncvarattrib = ncvarattrib, ncglobalattrib = metadata[s], surfextend = true, coeff_derivative2 = param_d2, #param_2d = [0,0,0] memtofit = 50 );
I do not think it is a bug. Both situations arise in regions where you basically extrapolate and where a gradient in the nearest data can lead the extrapolation into unrealistic values.
In theory the error field should allow to identify such regions, in particular the "exact" error field generally is higher in such regions than the approximations used in practice (but is expensive).
Did you try the logit transformation instead of the loglin ? The latter might be responsible to allowing larger values than expected.
Just a stupid question: the values you use in DIVAnd for 2 µmole is 2 or 2E-6 ? (that could play a role when data are transformed)
Related note or question to developers: in loglin is it implicitly assumed that the data have values around 1 so that we have anomalies around zero? Maybe we need to add a scaling parameter too ?
Hi @jmbeckers , I agree, it is not really a bug but I was not sure about the label to use... So, I haven't tried the logit transformation and the value I use for 2 umole is 2.
Update : logit instead of loglin = no change at all.
Use of logit : transform = Anam.logit(min=minval,max=maxval)
that is strange. What are the values of minval and maxval ?
minval = 0 maxval = 800 for DIN and 400 for silicate (in umole/l)
Isn't that very large ? What are the obs values in the deep region ?
yes, it is very large, I believe such high values can only be found in coastal areas, near river mouths. In the deep region, obs values remain below 100 umol/l
Is it too large ? Should I keep the boundaries to the values found in the deep region ?
I would say that ideally one should not have to restrict the values of the input data, except if they are obviously wrong, which doesn't seem to be the case here.
What can be done is to use the residuals (diff. between observation and analysis at the location of the obs.) to discard some of the data points when the residual values is too large.
Yes, OK, but I don't understand how this can help with the effects presented above ?
Maybe by removing such data points you avoid the large gradients mentioned by @jmbeckers, so the extrapolation doesn't give so dramatic results.
I would say that ideally one should not have to restrict the values of the input data, except if they are obviously wrong, which doesn't seem to be the case here.
What can be done is to use the residuals (diff. between observation and analysis at the location of the obs.) to discard some of the data points when the residual values is too large.
I did not mean to "cut" the input values. The logit transformation constrains the analysis to remain between two bounds. So if you know that there are never values larger than 100, logit with an upper value of 100 will make the analysis always fall below 100 (and also will probably reduce some gradients and hence avoid some larger extrapolations).
Thank you for your answer @jmbeckers. Indead, it was not clear to me but I understand now.
Hi, I would like to get back to the logit transformation constraint on the values with the "min" and "max" parameters : for the loglin transformation, there is also a "max" parameter. I assumed it referred to the maximum value that would be authorized in the retrived field but the result of the analysis does not confirm my assumption. So my question is : what does this "max" parameter in the loglin transformation refer to ?
Loglin:
Provide the following transform log(x + epsilon)
(for x < t) and its inverse.
Beyond the threshold t
(x ≥ t), the function is extended linearly in a
continuous way.
So for loglin beyond the max value a linear transformation is used. (and not a cutting/clipping, for that Logit is needed).
OK, my mistake, I forgot to re-check that. Thank you for your answer.
With logit instead of loglin, I tried on chlorophyll-a data, using min=0mg/m2 and max=5mg/m3 and get the following error : ''' ERROR: LoadError: DomainError with -2.0224948157370686: log will only return a complex result if called with a complex argument. Try log(Complex(x)). ''' On the same data, with loglin (using 5mg/m3 as threshold for the linear extension), no error.
It probably means you have data outside of that range (there is no sanity check in the function before doing the anamorphis). Can you check on the input data that they are in the range ?
Yes, I most likely have data outside of that range. I thought it would work nevertheless taking into account the data according to the range. I thought the logit transformation would only work on the analysis
I think it is more sound you decide yourself which data you retain or what to do with the data out of your desired range (you can decide to discard the data or clip).
Logit works on the analysis, but to do so, all data are transformed into the new domain, so if data fall outside, the transformation is not valid.