Rasters.jl
Rasters.jl copied to clipboard
Support for Raster Attribute Tables?
I ran into this raster with a Raster Attribute Table (RAT for short). Apparently it's an ArcGIS thing but GDAL does have some support for it.
The file is from here. (There are a bunch of similar rasters, this is just the smallest one)
Basically the values of the raster are integers that refer to a row in a column that is stored in a separate file (with .tif.vat.dbf extension).
Short demonstration how this works in terra:
> library(terra)
> r <- rast("sub_leaf_type_2021.tif")
> plot(r)
> activeCat(r)
[1] 1
> activeCat(r) <- 4
> plot(r)
> cats(r)
[[1]]
VALUE C_01 C_02 C_03 C_04
1 0 0 0 ND ND
2 1 1 1 Løv Broadleaved
3 2 2 2 Nål Coniferous
4 3 3 3 Ikke klassificeret Not classified
> levels(r)
[[1]]
VALUE C_04
1 0 ND
2 1 Broadleaved
3 2 Coniferous
4 3 Not classified
ArchGDAL does have a getdefaultRAT function but it seems to not recognize the table in this case.
using ArchGDAL
rs = ArchGDAL.read("sub_leaf_type_2021.tif")
band = ArchGDAL.getband(rs, 1)
RAT = ArchGDAL.getdefaultRAT(band)
RAT.ptr === C_NULL # true
The easiest thing to do would be to just read the table into the metadata? It is a pretty silly format to be honest so I don't know if we need the whole functionality that terra offers.
I don't really understand what youre doing with terra, but seems its a partly manual process?
I don't really know what I'm doing with terra either - but I guess it's similar to recode! in CategoricalArrays. So changing the categories shown without changing the ref values in the array.
The manual thing is switching between the columns in the table - but that will always have to be that way. No way to tell in advance which column a user is interested in.
But the most important bit here I think is to be able to access the table from this kind of file.
This worked with an xml file for me! There's an example in geocompr.
Gdal should show the RAT in the dataset's list of files. If it doesn't then it didn't recognize it. Maybe there's a compile option we missed? Would be good to see what Terra does here.
Gdal should show the RAT in the dataset's list of files.
In the example above:
julia> rs
GDAL Dataset (Driver: GTiff/GeoTIFF)
File(s):
D:/data/Basemap/leaftype/sub_leaf_type_2021.tif
D:/data/Basemap/leaftype/sub_leaf_type_2021.tif.ovr
D:/data/Basemap/leaftype/sub_leaf_type_2021.tif.xml
D:/data/Basemap/leaftype/sub_leaf_type_2021.tif.aux.xml
So the .tif.var.dbf is not recognized (there are also .tif.vat.cpg and .tfw files in this example that are not recognized, no clue what those do)
As of 2012 GDAL did not recognize DBF files with Rasters, but I have no clue whether that changed in the ~13 years since then
https://lists.osgeo.org/pipermail/gdal-dev/2012-September/033995.html
Terra actually reads this table (it's not a RAT but a VAT which I have 0 clue what that is) through its own dbf reader apparently
apparently a VAT is an ESRI value attribute table https://support.esri.com/en-us/gis-dictionary/vat#:~:text=%5BEsri%20software%5D%20Acronym%20for%20value,grid https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/raster-dataset-attribute-tables.htm#:~:text=When%20a%20raster%20attribute%20table,in%20the%20raster%20attribute%20table. https://support.esri.com/en-us/gis-dictionary/attribute-table#:~:text=%5Bdata%20structures%5D%20A%20database%20or,symbolize%20features%20or%20raster%20cells.
so I guess they were trying to standard on DBF tables across rasters and shapefiles (side note: why?!)
There's a lot from that function we can copy here in terms of handling, especially the color table etc.
Small reminder that we cant use that code, let alone read it, to make a PR to Rasters.jl under the current licenses of both packages.
Ah damn, i forgot its GPL
CategoricalArrays is probably a bad idea here because it forces a dict lookup, where an offset vector would be substantially more efficient. We probably need to write our own.
I actually just ran into another one of these VATs in the wild! And I found out about this package https://github.com/JuliaData/DBFTables.jl (which of course you 2 contributed to). That works to read these tables in - so at least no need to look at the terra source code anymore
Oh yeah, the only reason to look at Terra was to see how it detected the files - all the tooling already exists in Julia land :D
But the majority of the work here will be (a) hooking this into Rasters and (b) creating a custom categorical like array, maybe cribbed from IndirectArrays.jl at first, which:
- has a "value lookup array" whose index starts from zero
- has the ability to add more values on demand (trivial)
- supports missing value passthrough for which I believe the best solution is that we implement our own such array in Rasters and copy all the tests over from IndirectArrays.
But CategoricalArrays has pretty good ecosystem support, like MLJ etc...
@tiemvanderdeure wouldn't a CategoricalArray compatible thing be best? Although I guess it's still a Raster so dispatch won't work anyway...
Well I was thinking something on the modify level. But ecosystem compat will break either way for the reason you mentioned, so we may as well be efficient. CategoricalArrays does force a Dict lookup which I don't like too much for this.
We can always have it be a subtype of AbstractCategoricalArray though...
Yeah I also dislike their overuse of Dict. Would be nice if there was a traits based interface that could say "this raster is categorical" to all the stats packages, without needing dispatch on some AbstractCategoricalArray
Just to note, the way to read this VAT (what a cesspool, indeed)
julia> tab = DBFTables.Tables.columntable(DBFTables.Table("/Users/anshul/Downloads/sub_leaf_type_2021/sub_leaf_type_2021.tif.vat.dbf"))
(
VALUE = Union{Missing, Int64}[0, 1, 2, 3],
COUNT = Union{Missing, Int64}[679494987, 40761803, 23706893, 4797940],
C_01 = Union{Missing, Int64}[0, 1, 2, 3],
C_02 = Union{Missing, Int64}[0, 1, 2, 3],
C_03 = Union{Missing, String}["ND", "Løv", "Nål", "Ikke klassificeret"],
C_04 = Union{Missing, String}["ND", "Broadleaved", "Coniferous", "Not classified"]
)
now from there I guess you can always specify a symbol. It would be nice to do categorical(ras, :C_04) but I think that ship has sailed.
Please please please make this anything but a DBFTable before you run lookups on it. DBFTable stores things natively as strings and is terribly terribly slow.
They do have some native number types, notably an Int that doesn't use twos-compliment encoding
I think we can quite easily though this using modifieddiskarray, similarly to how we apply scaling. This is better than relying on categoricalarrays I think as it will work seamlessly for lazy rasters also. Users can always wrap in categorical afterwards if they want a CA.
Makes sense. So that needs an additional field in the Mod object for category mappings?
And an extension on CategoricalArrays to allow the conversion?