kGWASflow
kGWASflow copied to clipboard
Manhattan plots fail
Hi Kivanc,
I've updated kGWASflow and tried to run the analysis including aligning the kmers. Unfortunately, generating the Manhattan plot yields errors, with unhelpful logs:
y_max: 19.284643066713652
y_min: 9.220349838454867
Plotting...
/MankFlex/wouter/color_selection_lines/kmergwas/kgwasflow/.snakemake/scripts/tmpk48uhlxn.plot_manhattan.py:141: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
align_kmers_sam_with_all_chrom_sorted = align_kmers_sam_with_all_chrom.sort_values(by=["chr", "bp"], key=natsort.natsort_keygen())
I don't really care about the manhattan plots, I can make those myself, but turning them off here didn't do anything. I tried to check, and as far as I can tell, that option does not appear to be used anywhere? Shouldn't the manhattan rule have an if statement like here?
I'll just continue without the aligned kmers, but thought I would let you know.
Hi @Ax3man,
Thank you for reporting this issue. You are right about the plot_manhattan
setting. I originally planned manhattan plot generation to be optional, but later I decided to keep it permanent as part of align_kmers
phase since it wasn't costing much time or memory, and technically any output from align_kmers
step should work with plot_manhattan
rule without any problem.
Would you mind sharing your SAM file output generated during align_kmers
step? It should be in "results/align_kmers/{your_pheno}/{your_pheno}_kmers_alignment.sam"
. I would like to see what caused the problem during Manhattan plot generation so I can fix it.
I will also make the necessary changes regarding plot_manhattan
setting.
Best, Kivanc
Hi Kivanc,
I'm getting the same error, I checked for the SAM output as you suggested and there wasn't one. All of log files in that directory were empty, except kmers_align.bowtie2.log, which read:
54 reads; of these:
54 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
36 (66.67%) aligned exactly 1 time
18 (33.33%) aligned >1 times
100.00% overall alignment rate
I'm assuming this is why the Manhattan plot isn't working, since the Manhattan plot error message is talking about an empty Series. Not sure why the SAM output isn't being created or why the Manhattan step is running without it, I will let you know if I track it down.
Thanks,
Cedar
I've generated the SAM output, here are the first few lines:
@HD VN:1.5 SO:unsorted GO:query
@SQ SN:SL4.0ch00 LN:9643250
@SQ SN:SL4.0ch01 LN:90863682
@SQ SN:SL4.0ch02 LN:53473368
@SQ SN:SL4.0ch03 LN:65298490
@SQ SN:SL4.0ch04 LN:64459972
@SQ SN:SL4.0ch05 LN:65269487
@SQ SN:SL4.0ch06 LN:47258699
@SQ SN:SL4.0ch07 LN:67883646
@SQ SN:SL4.0ch08 LN:63995357
@SQ SN:SL4.0ch09 LN:68513564
@SQ SN:SL4.0ch10 LN:64792705
@SQ SN:SL4.0ch11 LN:54379777
@SQ SN:SL4.0ch12 LN:66688036
@PG ID:bowtie2 PN:bowtie2 VN:2.4.5 CL:"/xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/.snakemake/conda/221bdcf96cc8a42228d668ab0f75056c_/bin/bowtie2-align-s --wrapper basic-0 -p 1 -x /xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/resources/ref/genome/bowtie2_index/genome -f /xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/results/fetch_kmers/anther_pistil_ratio_kmers_list.fa -S /home/u16/cedar/scratch/kGWASflow/aligned_kmers/aligned_kmers.sam"
kmer1_3.707106e-11 16 SL4.0ch09 48997168 1 31M * 0 0 TATTTCCGCACTTTGTTAGATATTTGGTTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:24G6 YT:Z:UU
kmer2_3.415117e-12 16 SL4.0ch01 10753768 24 31M * 0 0 GAGGTAATTATCAATTTCTTTTTGACATTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:25T5 YT:Z:UU
kmer3_5.486155e-13 16 SL4.0ch03 15427129 24 31M * 0 0 TCACTCTGTCTGAAAAAGAATGTCCTAGTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:2C28 YT:Z:UU
kmer4_8.487645e-12 16 SL4.0ch05 48950027 1 31M * 0 0 AACTCTCCTTTGTAGGTATGTACCTCCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:1C29 YT:Z:UU
I think the problem is that the regex in the plot_manhattan.py
script is not recognizing the chromosome format. I'm using the tomato genome.
Hi @cedarwarman, I had the same issue today and was able to continue making Manhattan plots by replacing this line:
https://github.com/akcorut/kGWASflow/blob/27654c8398249a5581c1463b2036211cb3fdbc0d/workflow/scripts/plot_manhattan.py#L73
with name = chromosome_name
and it will let you go on with your own chromosome names. If you have many smaller contigs that you want to leave out, you'll have to tweak the regex yourself.