sgkit
sgkit copied to clipboard
pc_relate issue
When I run the pc_relate on my dataset, it gives an error "Input must be a square matrix to perform lu decomposition" in line 157 (half_beta = da.linalg.inv(2 * r).dot(q.T).dot(imputed_call_g.T)). I get an insight into QR decomposition, and the output q is squared, but r is not. However, if we don't run line 153. (pcsi = pcsi.T.rechunk((None, -1)), the r will be square. Do you know how to fix this?
Hey @shy218, could you attach the code you are using and show the input dataset for the call to pc_relate?
The code and data are in this github repo: https://github.com/shy218/sgkit-benchmark
@shy218 thanks for the repo! What is the source of that input data? What's the reason for this line:
variant_allele=(("variants","alleles"), ds['variant_allele'][:,0:2]),
@shy218 ah now I see, what was that line above for. Anyhow pc_relate requires that sample_pcs be (components, samples), as documented here, you are submitting sample_pca_component which is (variants, components). What you are looking for is sample_pca_projection BUT you need to transpose to be (components, samples) (it originally comes out as (samples, components) from our pca).
Btw, you might have more issues with that data since it fails to converge during PCA, you can see that by running:
rel.compute()
and you will get:
raise LinAlgError("SVD did not converge")
See comments here.
Also created a separate issue to make pca result readily usable in the context of pc_relate: #554 (without the extra transpose).
Closing as it looks like this was resolved