peddy
peddy copied to clipboard
KeyError in PCA.py
Hi Brent,
I cloned the peddy package yesterday and have got this error:
Traceback (most recent call last):
File "/home/torme/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/torme/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/torme/Desktop/TOOLS/peddy/peddy/main.py", line 14, in
I get output files ped_check and ped_check_rel-difference, but no files for sex or PCA.
I get an error at position 1:900505 - I'm not sure how peddy runs in terms of order of variants in the file, but this is not the first variant position in my VCF. (However perhaps for speed the program does not run through chronologically so not sure this is relevant anyway.) This position looks fine in my VCF (is present, has the same ref and alt alleles as in 1000Genomes data). Is there anything obvious that would cause this error?
Sorry if this is an obvious oversight on my part... The output that I did get looks beautiful, so thank you!
Many thanks, Tatiana
I wonder if you are getting a mix of peddy modules. the new version has a different set of sites and so maybe you're somehow getting the old PCA module? I suspect this is the case because you're running it out of the source directory.
this should be fixed with local imports in peddy, but, pending that fix, you can also check that you only have 1 peddy module available.
thanks for reporting.
Hi Brent,
Thank you for getting back to me. Just to let you know (in case it helps trouble shoot) I removed all things Peddy and reinstalled and got the same error as above. I then ran on a different VCF, in case it was a problem with the original VCF file, and I got the same error up til the last line - instead of the error in line 46, in pca - I got: File "peddy/pca.py", line 60, in pca clf = make_pipeline(PCA(n_components=4, whiten=True, copy=True, svd_solver="randomized"), TypeError: init() got an unexpected keyword argument 'svd_solver'
Does this seem like a problem at my end (problem with installation or VCF)? As I don't want to waste your time!
All the best
you must have a very old version of scikit-learn. I would update that, re-run and report the error (if there is one).
Which version of scikit-learn should I choose for this bug
@ruizgo which error? the svd_sovler error? Any recent version is fine.
If you are seeing an error like in the first message (KeyError: u'1:900505:G:C'
), then you must be using a set of sites that doesn't match the ones used to create the thousand genomes labeled set. Can you share the command you ran along with the full error message?
2023-05-18 03:06:57 01a458450dfc peddy.cli[786] INFO Running Peddy version 0.4.8
2023-05-18 03:06:57 01a458450dfc peddy.cli[786] INFO ped_check
2023-05-18 03:06:57 01a458450dfc peddy.cli[786] INFO ran in 0.3 seconds
2023-05-18 03:06:57 01a458450dfc peddy.cli[786] INFO het_check
2023-05-18 03:06:58 01a458450dfc peddy.pca[786] INFO loaded and subsetted thousand-genomes genotypes (shape: (2504, 1)) in 0.4 seconds
Traceback (most recent call last):
File "/usr/local/bin/peddy", line 33, in
This is an error output message, which may cause errors when reading certain VCF files
This means that there was only 1 SNP in the thousand genomes set that was in your set. So you either have data that is too sparse or you're using the wrong genome build most likely.
This means that there was only 1 SNP in the thousand genomes set that was in your set. So you either have data that is too sparse or you're using the wrong genome build most likely.
I will check the upstream operation, thank you for your reply!