MultivariateStats.jl
MultivariateStats.jl copied to clipboard
Correspondence Analysis Implementation?
Are there any plans on implementing Correspondence Analysis and Multiple Correspondence Analysis?
Not in plans, as I have no knowledge of it. Is this some kind of variation of PCA? If you have some something written, your PR is always welcome.
It is related to PCA, it allows to apply PCA to categorical data by using contingency tables. I'll try to work on it!
Any news on that front?
I'm working on this at the moment and will submit a pull request when I find the time to finish it off, here is a barebones function in the meantime. It follows the computational algorithm outlined in appendix A of Greenacre (2017) and implemented in the R function ca::ca()
(Nenadic and Greenacre, 2007).
I've checked that the standard coordinates of this function are equal to those produced in ca::ca()
in a Quarto notebook with base::all.equal()
using the dune
dataset bundled with the R package vegan
.
using NamedArrays
using LinearAlgebra
function correspondence_analysis(N::NamedMatrix)
# A.1 Create the correspondence matrix
P = N / sum(N)
# A.2 Calculate column and row masses
r = vec(sum(P, dims = 2))
c = vec(sum(P, dims = 1))
# A.3 Diagonal matrices of row and column masses
Dr = Diagonal(r)
Dc = Diagonal(c)
# A.4 Calculate the matrix of standardized residuals
SR = Dr^(-1/2) * (P - r * transpose(c)) * Dc^(-1/2)
# A.5 Calculate the Singular Value Decomposition (SVD) of S
svd = LinearAlgebra.svd(SR)
U = svd.U
V = svd.V
S = svd.S
D = Diagonal(S)
# A.6 Standard coordinates Φ of rows
Φ_rownames = names(N)[1]
Φ_colnames = vec(["Dim"].*string.([1:1:size(N,1);]))
Φ = NamedArray(Dr^(-1/2) * U, names = (Φ_rownames, Φ_colnames), dimnames = ("Row", "Dimension"))
# A.7 Standard coordinates Γ of columns
Γ_rownames = names(N)[2]
Γ_colnames = vec(["Dim"].*string.([1:1:size(N,1);]))
Γ = NamedArray(Dc^(-1/2) * V, names = (Γ_rownames, Γ_colnames), dimnames = ("Column", "Dimension"))
# A.8 Principal coordinates F of rows
F = Dr^(-1/2) * U * D
# A.9 Principal coordinates G of columns
G = Dc^(-1/2) * V * D
results = (sv = D,
rownames = names(N)[1],
rowmass = r,
rowcoord = Φ,
colnames = names(N)[2],
colmass = c,
colcoord = Γ
)
return results
end
References
Greenacre, Michael. 2017. Correspondence Analysis in Practice, Third Edition. CRC Press. Nenadic, Oleg, and Michael Greenacre. 2007. “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The Ca Package.” Journal of Statistical Software 20 (February): 1–13. https://doi.org/10.18637/jss.v020.i03.
Lookin forward to seeing the commit! Until then, i will be experimenting with the function here. Thanks!