sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

pc_relate issue

Open shy218 opened this issue 4 years ago • 5 comments

When I run the pc_relate on my dataset, it gives an error "Input must be a square matrix to perform lu decomposition" in line 157 (half_beta = da.linalg.inv(2 * r).dot(q.T).dot(imputed_call_g.T)). I get an insight into QR decomposition, and the output q is squared, but r is not. However, if we don't run line 153. (pcsi = pcsi.T.rechunk((None, -1)), the r will be square. Do you know how to fix this?

shy218 avatar Apr 25 '21 06:04 shy218

Hey @shy218, could you attach the code you are using and show the input dataset for the call to pc_relate?

eric-czech avatar Apr 25 '21 08:04 eric-czech

The code and data are in this github repo: https://github.com/shy218/sgkit-benchmark

shy218 avatar Apr 25 '21 10:04 shy218

@shy218 thanks for the repo! What is the source of that input data? What's the reason for this line:

variant_allele=(("variants","alleles"), ds['variant_allele'][:,0:2]),

ravwojdyla avatar Apr 29 '21 15:04 ravwojdyla

@shy218 ah now I see, what was that line above for. Anyhow pc_relate requires that sample_pcs be (components, samples), as documented here, you are submitting sample_pca_component which is (variants, components). What you are looking for is sample_pca_projection BUT you need to transpose to be (components, samples) (it originally comes out as (samples, components) from our pca).

Btw, you might have more issues with that data since it fails to converge during PCA, you can see that by running:

rel.compute()

and you will get:

raise LinAlgError("SVD did not converge")

See comments here.

ravwojdyla avatar Apr 29 '21 17:04 ravwojdyla

Also created a separate issue to make pca result readily usable in the context of pc_relate: #554 (without the extra transpose).

ravwojdyla avatar Apr 29 '21 17:04 ravwojdyla

Closing as it looks like this was resolved

tomwhite avatar Jan 04 '23 16:01 tomwhite