ForestTools icon indicating copy to clipboard operation
ForestTools copied to clipboard

Possible Mistake in GLCM Sum Average

Open ailich opened this issue 2 years ago • 6 comments

Hi, I was looking into implementing GLCM Sum Average in my own R package and noticed a potential mistake in the code in this repository (ailich/GLCMTextures#26). I could be wrong but I noticed that in this line i+1 is used rather than i-1. I believe i+1 should only be used if starting the loop at 0 and linking it to a corresponding index in a vector/matrix since R is 1 rather than 0 indexed. Additionally in the original Haralick formula the same value is used for both the px+y and the multiplication so I don't see why it should be i+1 and i-1.

Screenshot 2023-11-20 112511

There is also this version of the formula that adjusts for gray levels starting at 0 rather than 1, which seems to generally be how the methods are applied.

Screenshot 2023-11-20 113334

Moreover, I noticed it loops through i in 1:(2*nrow(glcm) which I believe is incorrect. For example with 4 gray levels that would mean you'd have values 0-3 and a maximum sum of 6. Therefore if plugging in i-1, i should go from 1 to 7, but the current code would go from 1:8 even though it is impossible for i-1 to sum to 7. Also just a note, the first iteration could technically be skipped since anything times 0 is 0 and it would not affect the sum.

ailich avatar Nov 20 '23 16:11 ailich

Hi @ailich, I finally got around to going through your library. Nice work!

As you say in our README:

When comparing results across different software that calculate GLCM texture metrics, there are inconsistencies among results.

This has been my experience as well. To be honest, although I understand the principle behind GLCMs, I've found many of the formulas for calculating the metrics (sum average, entropy, etc.) a bit confusing.

Furthermore, ForestTools isn't really meant to be a library for computing textures. It used to depend on radiomics. The only addition I made was that it would calculate GLCM metrics for segments (i.e.: irregularly shaped images) instead of rectangular images.

It's best to not have multiple GLCM libraries floating around giving people different results. If I were to have ForestTools use the code in GLCMTextures, would there be any way to essentially have NA values in the image that get ignored?

Ex.:

> seg_mat <- matrix(c(2,NA,NA,6,6,0,2,2,NA,1,NA,NA,2,NA,NA), ncol=5, nrow=3)
> seg_mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    6    2    1    2
[2,]   NA    6    2   NA   NA
[3,]   NA    0   NA   NA   NA

andrew-plowright avatar Jan 13 '24 18:01 andrew-plowright

@andrew-plowright, the GLCMTextures package can handle NA values by specifying na.rm=TRUE. The main function of the package, glcm_textures, is a focal/moving window function that will move across entire raster using a rectangular window, so that doesn't seem to be what you need. The make_glcm and glcm_metrics functions though can be used to calculate a single value for each texture measure over a matrix of gray values, so I those could be useful for ForestTools. make_glcm takes a matrix of quantized values and returns a glcm containing counts if normalize=FALSE and probabilities if normalize=TRUE. glcm_metrics takes a normalized glcm and returns the requested texture measures.

library(GLCMTextures)
#> Loading required package: terra
#> terra 1.7.64

seg_mat <- matrix(c(2,NA,NA,6,6,0,2,2,NA,1,NA,NA,2,NA,NA), ncol=5, nrow=3)

glcm<- make_glcm(seg_mat, n_levels = 7, shift = c(1,0), na.rm = TRUE, normalize = TRUE)
glcm
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    0  0.0  0.0    0    0    0  0.0
#> [2,]    0  0.0  0.2    0    0    0  0.0
#> [3,]    0  0.2  0.0    0    0    0  0.3
#> [4,]    0  0.0  0.0    0    0    0  0.0
#> [5,]    0  0.0  0.0    0    0    0  0.0
#> [6,]    0  0.0  0.0    0    0    0  0.0
#> [7,]    0  0.0  0.3    0    0    0  0.0

glcm_metrics(glcm, c("glcm_contrast", "glcm_dissimilarity", "glcm_homogeneity", "glcm_ASM",
                     "glcm_entropy", "glcm_mean", "glcm_variance", "glcm_correlation", "glcm_SA"))
#>      glcm_contrast glcm_dissimilarity   glcm_homogeneity           glcm_ASM 
#>         10.0000000          2.8000000          0.2352941          0.2600000 
#>       glcm_entropy          glcm_mean      glcm_variance   glcm_correlation 
#>          1.3661588          3.0000000          4.0000000         -0.2500000 
#>            glcm_SA 
#>          6.0000000

Created on 2024-01-14 with reprex v2.0.2

ailich avatar Jan 14 '24 19:01 ailich

Thanks for your response Alex. First, I just wanted to mention that I may have found a bug when using these settings.

library(GLCMTextures)
seg_mat <- matrix(c(2,NA,NA,6,6,0,2,2,NA,1,NA,NA,2,NA,NA), ncol=5, nrow=3)
make_glcm(seg_mat, n_levels = 6, shift = c(1,0), na.rm = TRUE, normalize = FALSE)

This seems to cause the system to crash.

Secondly: is it correct that make_glcm both quantizes the input matrix and then computes the glcm? Is there any chance that those two operations could be teased apart as separate functions?

ForestTools already has some code that will "break apart" an image into several smaller matrices according to a segmentation pattern. However, for the statistics produced from those segments to be comparable, I assume the image should only be quantized once prior to being segmented.

Does that make sense, and does that align with your understanding of how these numbers should be interpreted?

andrew-plowright avatar Jan 20 '24 00:01 andrew-plowright

Thanks @andrew-plowright. The reason that's crashing it is because with values 0-6 you have 7 grey levels (and in fact GLCMTextures always assumes that the lowest grey level is zero). With 6 grey levels, the GLCM will be a 6x6 matrix. Since C++ is zero indexed (the first row/column is called 0, not 1), when the code runs into the number "6" in segmat, it will try to add a count to the 7th row/column which is outside the bounds of a 6x6 matrix. This causes the C++ code to crash and R to abort. I did have a check for that, but it there was a bug in that so thank you for pointing that out. I just fixed it in the latest commit. The code will now send an error message rather than crash R.

library(GLCMTextures)
#> Loading required package: terra
#> terra 1.7.64
seg_mat <- matrix(c(2,NA,NA,6,6,0,2,2,NA,1,NA,NA,2,NA,NA), ncol=5, nrow=3)
make_glcm(seg_mat, n_levels = 6, shift = c(1,0), na.rm = TRUE, normalize = FALSE)
#> Error in make_glcm(seg_mat, n_levels = 6, shift = c(1, 0), na.rm = TRUE, : Error: x must have values between 0 and n_levels-1

Created on 2024-01-22 with reprex v2.0.2

ailich avatar Jan 20 '24 04:01 ailich

@andrew-plowright, there seems to be some slight terminology confusion. Quantization and normalization are not the same thing. Quantization is done to the raw data to reduce it to discrete integers (the grey levels). Normalization converts counts in the GLCM to probabilities. You'll want to have normalize=TRUE (the default behavior) since all the texture measure formulas use probabilities. The option to set normalize=FALSE is pretty much only there for demonstration purposes in the ReadME that walk through how the GLCM is created step by step.

library(GLCMTextures)
#> Loading required package: terra
#> terra 1.7.64
seg_mat <- matrix(c(2,NA,NA,6,6,0,2,2,NA,1,NA,NA,2,NA,NA), ncol=5, nrow=3)

glcm_normFALSE<- make_glcm(seg_mat, n_levels = 7, shift = c(1,0), na.rm = TRUE, normalize = FALSE)

glcm_normTRUE<- make_glcm(seg_mat, n_levels = 7, shift = c(1,0), na.rm = TRUE, normalize = TRUE)

glcm_normFALSE # Integers representing counts of when values were next to each other in seg_mat
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    0    0    0    0    0    0    0
#> [2,]    0    0    2    0    0    0    0
#> [3,]    0    2    0    0    0    0    3
#> [4,]    0    0    0    0    0    0    0
#> [5,]    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0
#> [7,]    0    0    3    0    0    0    0

glcm_normTRUE # Decimals representing the probability that values were next to each other in seg_mat
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    0  0.0  0.0    0    0    0  0.0
#> [2,]    0  0.0  0.2    0    0    0  0.0
#> [3,]    0  0.2  0.0    0    0    0  0.3
#> [4,]    0  0.0  0.0    0    0    0  0.0
#> [5,]    0  0.0  0.0    0    0    0  0.0
#> [6,]    0  0.0  0.0    0    0    0  0.0
#> [7,]    0  0.0  0.3    0    0    0  0.0

sum(glcm_normTRUE) #Probabilities sum to 1
#> [1] 1

glcm_normFALSE/sum(glcm_normFALSE) # Dividing the counts by the total number of counts (the sum) provides an equivalent result to setting normalize=TRUE
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    0  0.0  0.0    0    0    0  0.0
#> [2,]    0  0.0  0.2    0    0    0  0.0
#> [3,]    0  0.2  0.0    0    0    0  0.3
#> [4,]    0  0.0  0.0    0    0    0  0.0
#> [5,]    0  0.0  0.0    0    0    0  0.0
#> [6,]    0  0.0  0.0    0    0    0  0.0
#> [7,]    0  0.0  0.3    0    0    0  0.0

Created on 2024-01-20 with reprex v2.0.2

make_glcm requires that the input matrix already be quantized from 0 to n_levels-1. I assume you're working with SpatRaster objects from terra. The entire raster can be quantized using quantize_raster which has the option for equal probability (approximately the same number of cells end up in each bin based on quantiles) or equal range quantization which creates equally sized bins in terms of the range of the data (e.g. 1-5 -> 0, 6-10 ->1, 11-15 ->2, etc). This quantized raster could then be provided to ForestTools to break it into segments, and then make_glcm and glcm_metrics could be applied to the matrix from each of those segments.

ailich avatar Jan 20 '24 04:01 ailich

Thanks @ailich, this is all sounding quite promising. I'll create a branch of ForestTools this weekend and try implementing this.

andrew-plowright avatar Jan 24 '24 01:01 andrew-plowright

Hi @ailich , it took a while but I've just pushed a new version of this package whereby the calculation of GLCMs and the associated metrics are handled by GLCMTextures. This leaves ForestTools to focus on implementing GLCM for individually segmented tree crowns instead of handling the foundational calculations, thus avoiding having two different R packages that are doing the same thing.

Thank you very much for your work on GLCMTextures, and for taking the time to reach out and inform me of the potential error in calculations. Much appreciated.

andrew-plowright avatar Apr 27 '24 22:04 andrew-plowright