Rasters.jl icon indicating copy to clipboard operation
Rasters.jl copied to clipboard

Support for Raster Attribute Tables?

Open tiemvanderdeure opened this issue 8 months ago • 19 comments

I ran into this raster with a Raster Attribute Table (RAT for short). Apparently it's an ArcGIS thing but GDAL does have some support for it.

The file is from here. (There are a bunch of similar rasters, this is just the smallest one)

Basically the values of the raster are integers that refer to a row in a column that is stored in a separate file (with .tif.vat.dbf extension).

Short demonstration how this works in terra:

> library(terra)
> r <- rast("sub_leaf_type_2021.tif")
> plot(r)
> activeCat(r)
[1] 1
> activeCat(r) <- 4
> plot(r)
> cats(r)
[[1]]
  VALUE C_01 C_02               C_03           C_04
1     0    0    0                 ND             ND
2     1    1    1                Løv    Broadleaved
3     2    2    2                Nål     Coniferous
4     3    3    3 Ikke klassificeret Not classified

> levels(r)
[[1]]
  VALUE           C_04
1     0             ND
2     1    Broadleaved
3     2     Coniferous
4     3 Not classified

Image

Image

ArchGDAL does have a getdefaultRAT function but it seems to not recognize the table in this case.

using ArchGDAL
rs = ArchGDAL.read("sub_leaf_type_2021.tif")
band = ArchGDAL.getband(rs, 1)
RAT = ArchGDAL.getdefaultRAT(band)
RAT.ptr === C_NULL # true

The easiest thing to do would be to just read the table into the metadata? It is a pretty silly format to be honest so I don't know if we need the whole functionality that terra offers.

sub_leaf_type_2021.zip

tiemvanderdeure avatar Mar 24 '25 08:03 tiemvanderdeure

I don't really understand what youre doing with terra, but seems its a partly manual process?

rafaqz avatar Mar 24 '25 09:03 rafaqz

I don't really know what I'm doing with terra either - but I guess it's similar to recode! in CategoricalArrays. So changing the categories shown without changing the ref values in the array.

The manual thing is switching between the columns in the table - but that will always have to be that way. No way to tell in advance which column a user is interested in.

But the most important bit here I think is to be able to access the table from this kind of file.

tiemvanderdeure avatar Mar 24 '25 09:03 tiemvanderdeure

This worked with an xml file for me! There's an example in geocompr.

Gdal should show the RAT in the dataset's list of files. If it doesn't then it didn't recognize it. Maybe there's a compile option we missed? Would be good to see what Terra does here.

asinghvi17 avatar Mar 24 '25 11:03 asinghvi17

Gdal should show the RAT in the dataset's list of files.

In the example above:

julia> rs
GDAL Dataset (Driver: GTiff/GeoTIFF)
File(s):
  D:/data/Basemap/leaftype/sub_leaf_type_2021.tif
  D:/data/Basemap/leaftype/sub_leaf_type_2021.tif.ovr
  D:/data/Basemap/leaftype/sub_leaf_type_2021.tif.xml
  D:/data/Basemap/leaftype/sub_leaf_type_2021.tif.aux.xml

So the .tif.var.dbf is not recognized (there are also .tif.vat.cpg and .tfw files in this example that are not recognized, no clue what those do)

tiemvanderdeure avatar Mar 24 '25 12:03 tiemvanderdeure

As of 2012 GDAL did not recognize DBF files with Rasters, but I have no clue whether that changed in the ~13 years since then

https://lists.osgeo.org/pipermail/gdal-dev/2012-September/033995.html

Terra actually reads this table (it's not a RAT but a VAT which I have 0 clue what that is) through its own dbf reader apparently

asinghvi17 avatar Mar 24 '25 15:03 asinghvi17

apparently a VAT is an ESRI value attribute table https://support.esri.com/en-us/gis-dictionary/vat#:~:text=%5BEsri%20software%5D%20Acronym%20for%20value,grid https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/raster-dataset-attribute-tables.htm#:~:text=When%20a%20raster%20attribute%20table,in%20the%20raster%20attribute%20table. https://support.esri.com/en-us/gis-dictionary/attribute-table#:~:text=%5Bdata%20structures%5D%20A%20database%20or,symbolize%20features%20or%20raster%20cells.

so I guess they were trying to standard on DBF tables across rasters and shapefiles (side note: why?!)

asinghvi17 avatar Mar 24 '25 15:03 asinghvi17

There's a lot from that function we can copy here in terms of handling, especially the color table etc.

Small reminder that we cant use that code, let alone read it, to make a PR to Rasters.jl under the current licenses of both packages.

rafaqz avatar Mar 24 '25 19:03 rafaqz

Ah damn, i forgot its GPL

asinghvi17 avatar Mar 24 '25 19:03 asinghvi17

CategoricalArrays is probably a bad idea here because it forces a dict lookup, where an offset vector would be substantially more efficient. We probably need to write our own.

asinghvi17 avatar Mar 25 '25 19:03 asinghvi17

I actually just ran into another one of these VATs in the wild! And I found out about this package https://github.com/JuliaData/DBFTables.jl (which of course you 2 contributed to). That works to read these tables in - so at least no need to look at the terra source code anymore

tiemvanderdeure avatar Apr 04 '25 14:04 tiemvanderdeure

Oh yeah, the only reason to look at Terra was to see how it detected the files - all the tooling already exists in Julia land :D

asinghvi17 avatar Apr 04 '25 14:04 asinghvi17

But the majority of the work here will be (a) hooking this into Rasters and (b) creating a custom categorical like array, maybe cribbed from IndirectArrays.jl at first, which:

  • has a "value lookup array" whose index starts from zero
  • has the ability to add more values on demand (trivial)
  • supports missing value passthrough for which I believe the best solution is that we implement our own such array in Rasters and copy all the tests over from IndirectArrays.

asinghvi17 avatar Apr 04 '25 14:04 asinghvi17

But CategoricalArrays has pretty good ecosystem support, like MLJ etc...

@tiemvanderdeure wouldn't a CategoricalArray compatible thing be best? Although I guess it's still a Raster so dispatch won't work anyway...

rafaqz avatar Apr 04 '25 15:04 rafaqz

Well I was thinking something on the modify level. But ecosystem compat will break either way for the reason you mentioned, so we may as well be efficient. CategoricalArrays does force a Dict lookup which I don't like too much for this.

We can always have it be a subtype of AbstractCategoricalArray though...

asinghvi17 avatar Apr 04 '25 15:04 asinghvi17

Yeah I also dislike their overuse of Dict. Would be nice if there was a traits based interface that could say "this raster is categorical" to all the stats packages, without needing dispatch on some AbstractCategoricalArray

rafaqz avatar Apr 05 '25 13:04 rafaqz

Just to note, the way to read this VAT (what a cesspool, indeed)

julia> tab = DBFTables.Tables.columntable(DBFTables.Table("/Users/anshul/Downloads/sub_leaf_type_2021/sub_leaf_type_2021.tif.vat.dbf"))
(
VALUE = Union{Missing, Int64}[0, 1, 2, 3], 
COUNT = Union{Missing, Int64}[679494987, 40761803, 23706893, 4797940], 
C_01 = Union{Missing, Int64}[0, 1, 2, 3], 
C_02 = Union{Missing, Int64}[0, 1, 2, 3], 
C_03 = Union{Missing, String}["ND", "Løv", "Nål", "Ikke klassificeret"], 
C_04 = Union{Missing, String}["ND", "Broadleaved", "Coniferous", "Not classified"]
)

now from there I guess you can always specify a symbol. It would be nice to do categorical(ras, :C_04) but I think that ship has sailed.

Please please please make this anything but a DBFTable before you run lookups on it. DBFTable stores things natively as strings and is terribly terribly slow.

asinghvi17 avatar Apr 10 '25 20:04 asinghvi17

They do have some native number types, notably an Int that doesn't use twos-compliment encoding

rafaqz avatar Apr 10 '25 20:04 rafaqz

I think we can quite easily though this using modifieddiskarray, similarly to how we apply scaling. This is better than relying on categoricalarrays I think as it will work seamlessly for lazy rasters also. Users can always wrap in categorical afterwards if they want a CA.

tiemvanderdeure avatar Jun 26 '25 14:06 tiemvanderdeure

Makes sense. So that needs an additional field in the Mod object for category mappings?

And an extension on CategoricalArrays to allow the conversion?

rafaqz avatar Jun 27 '25 09:06 rafaqz