geoblaze icon indicating copy to clipboard operation
geoblaze copied to clipboard

Gnarly results that I can't explain

Open StevenLangbroek opened this issue 2 years ago • 3 comments

Describe the bug Hey hey! I'm working with a dataset called GSOC to calculate soil organic carbon for a geometry. There's some geometries that are producing... interesting :D results for us. If I run geoblaze.stats on the geometry, it gives me these results:

[
  {
    "count": 3,
    "valid": 3,
    "invalid": 0,
    "median": 43,
    "min": -3.3999999521443642e+38,
    "max": 43,
    "sum": -3.3999999521443642e+38,
    "range": 3.3999999521443642e+38,
    "mean": -1.1333333173814548e+38,
    "variance": 2.5688888165737062e+76,
    "std": 1.6027753481301446e+38,
    "histogram": {
      "43": {
        "n": 43,
        "ct": 2
      },
      "-3.3999999521443642e+38": {
        "n": -3.3999999521443642e+38,
        "ct": 1
      }
    },
    "modes": [
      43
    ],
    "mode": 43,
    "uniques": [
      -3.3999999521443642e+38,
      43
    ]
  }
]

That -3.3999999521443642e+38, doesn't seem right, and I have no idea what's causing that...

To Reproduce

// this is close to my house, found it by accident but the issue is prevalent in the dataset.
const geometry = [
                    [
                        [
                            13.457568617507945,
                            52.49182485147867
                        ],
                        [
                            13.46005856478584,
                            52.492777796740285
                        ],
                        [
                            13.476492216820844,
                            52.487622982948125
                        ],
                        [
                            13.479195588151981,
                            52.48467710379617
                        ],
                        [
                            13.473148573333333,
                            52.48155772259062
                        ],
                        [
                            13.457568617507945,
                            52.49182485147867
                        ]
                    ]
                ];
const gsoc = await geoblaze.parse('https://storage.googleapis.com/fao-maps-catalog-data/geonetwork/gsoc/GSOCmap/GSOCmap1.5.0.tif';
const stats = await geoblaze.stats(gsoc, geometry);

Expected behavior I'm not sure to be honest. QGIS doesn't give me values like this when loading the dataset and looking up the coordinates...

Any help at all understanding what's happening here would be greatly appreciated <3

StevenLangbroek avatar Jun 13 '23 14:06 StevenLangbroek

I'm also not entirely sure whether .mean produces correct results... It seems that it doesn't actually pick the mean, but produces an average? Is that possible? Given these values, I had expected mean to be (pseudo-code alert):

const values = [-3.39e+38, 42, 42]
expect(mean(value)).toBe(values[1]);

StevenLangbroek avatar Jun 13 '23 14:06 StevenLangbroek

So, I've found the area in the dataset explorer and these are indeed "correct" values (although they're obviously gibberish). Is there a way to filter these out somehow? I couldn't find an appropriate API for it, but I'm assuming that's because I'm a dum-dum :D.

StevenLangbroek avatar Jun 13 '23 14:06 StevenLangbroek

Hey, sorry about that. Nothing you've done wrong. There's just often weirdness when dealing with no data values and Float 32 numbers. There's a bit of discrepancy between the No Data Value provided in the GeoTIFF metadata ("-3.39999999999999996e+038\x00") and what can actually be represented in JavaScript (-3.3999999521443642e+38). I think the issue is that whatever system is writing the noDataValue doesn't change the value based on the number of bits in encoding (but that's just a hunch). Unfortunately, it'd take some time to develop a proper fix.

Fortunately though, there's two workarounds possible. The easiest is to pass in a new filter function to the stats call:

const filter = value => value !== undefined && value !== -3.3999999521443642e+38;

geoblaze.stats(gsoc, geometry, undefined, filter);

The filter function is undocumented functionality, so I can't commit to maintaining that specific function param in the future. So if you go this route, I'd recommend locking your geoblaze version in your package.json (if you haven't already).

Alternatively, another solution, which is guaranteed to work into the future is to correct the noDataValue after parsing like so:

import parseGeoRaster from "georaster";

const georaster = await parseGeoRaster(url);
georaster.noDataValue = -3.3999999521443642e+38;
geoblaze.stats(georaster, geometry);

This is guaranteed to work because it's using georaster's public API, which I will always do my best to maintain.

Let me know if this helps. Happy to provide more assistance.

Also, thank you so much for alerting me to this issue and this great dataset. Now with a publicly available geotiff file in hand, I'll be able to write some tests and starting thinking about how to solve this issue in the future.

DanielJDufour avatar Jun 18 '23 22:06 DanielJDufour