MultivariateStats.jl icon indicating copy to clipboard operation
MultivariateStats.jl copied to clipboard

Correspondence Analysis Implementation?

Open Heladio-ac opened this issue 5 years ago • 6 comments

Are there any plans on implementing Correspondence Analysis and Multiple Correspondence Analysis?

Heladio-ac avatar Oct 13 '19 00:10 Heladio-ac

Not in plans, as I have no knowledge of it. Is this some kind of variation of PCA? If you have some something written, your PR is always welcome.

wildart avatar Oct 14 '19 22:10 wildart

It is related to PCA, it allows to apply PCA to categorical data by using contingency tables. I'll try to work on it!

Heladio-ac avatar Oct 15 '19 03:10 Heladio-ac

Any news on that front?

atantos avatar May 24 '21 16:05 atantos

I'm working on this at the moment and will submit a pull request when I find the time to finish it off, here is a barebones function in the meantime. It follows the computational algorithm outlined in appendix A of Greenacre (2017) and implemented in the R function ca::ca() (Nenadic and Greenacre, 2007).

I've checked that the standard coordinates of this function are equal to those produced in ca::ca() in a Quarto notebook with base::all.equal() using the dune dataset bundled with the R package vegan.

using NamedArrays
using LinearAlgebra

function correspondence_analysis(N::NamedMatrix)
  
  # A.1 Create the correspondence matrix
  P = N / sum(N)

  # A.2 Calculate column and row masses
  r = vec(sum(P, dims = 2))
  c = vec(sum(P, dims = 1))

  # A.3 Diagonal matrices of row and column masses
  Dr = Diagonal(r)
  Dc = Diagonal(c)

  # A.4 Calculate the matrix of standardized residuals
  SR = Dr^(-1/2) * (P - r * transpose(c)) * Dc^(-1/2)

  # A.5 Calculate the Singular Value Decomposition (SVD) of S
  svd = LinearAlgebra.svd(SR)
  U = svd.U
  V = svd.V
  S = svd.S
  D = Diagonal(S)

  # A.6 Standard coordinates Φ of rows
  Φ_rownames = names(N)[1]
  Φ_colnames = vec(["Dim"].*string.([1:1:size(N,1);]))
  Φ = NamedArray(Dr^(-1/2) * U, names = (Φ_rownames, Φ_colnames), dimnames = ("Row", "Dimension"))
  
  # A.7 Standard coordinates Γ of columns
  Γ_rownames = names(N)[2]
  Γ_colnames = vec(["Dim"].*string.([1:1:size(N,1);]))
  Γ = NamedArray(Dc^(-1/2) * V, names = (Γ_rownames, Γ_colnames), dimnames = ("Column", "Dimension"))
  
  # A.8 Principal coordinates F of rows
  F = Dr^(-1/2) * U * D
  
  # A.9 Principal coordinates G of columns
  G = Dc^(-1/2) * V * D

  results = (sv = D,
             rownames = names(N)[1],
             rowmass = r,
             rowcoord = Φ,
             colnames = names(N)[2],
             colmass = c,
             colcoord = Γ
            )

  return results

end

References

Greenacre, Michael. 2017. Correspondence Analysis in Practice, Third Edition. CRC Press. Nenadic, Oleg, and Michael Greenacre. 2007. “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The Ca Package.” Journal of Statistical Software 20 (February): 1–13. https://doi.org/10.18637/jss.v020.i03.

ZekeMarshall avatar Feb 07 '24 18:02 ZekeMarshall

Lookin forward to seeing the commit! Until then, i will be experimenting with the function here. Thanks!

atantos avatar Feb 11 '24 07:02 atantos