msmc-tools
msmc-tools copied to clipboard
Got "IndexError: list index out of range" in generate_multihetsep.py
I am trying to use MSMC in the Cat species. But I got an error as follows.
msmc-tools-master/generate_multihetsep.py --chr=${CHR} --mask=${BAM}out_mask_chr${CHR}.vcf.gz ${VCF}${CHR}phased.vcf.gz > ${VCF}${CHR}_multihetsep.txt
generating msmc input file with 2 haplotypes
adding mask: cat_msmc_test/bam/ERR2497923_sorted.bam_out_mask_chrA1.vcf.gz
Traceback (most recent call last):
File "msmc-tools-master/generate_multihetsep.py", line 200, in
I am wondering about one possibility getting the error is Chromosome number in Cat (e.g. A1, A2...). Do you have any ideas to solve it? Thank you!
Are you sure you have a correct mask file here? It seems you're giving it a VCF instead of a BED file for the mask.
Hi! I'm facing a similar error message but this time while reading the vcf file.
Traceback (most recent call last):
File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 195, in
The command I used is this: python3 /proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py --mask=mask_files/$ind.$chr.bed.gz --mask=mapping_mask/Oar3.1.maskchr${chr}.mask.bed.gz vcf_out/$ind.$chr.vcf.gz > input_singleind/$ind.$chr.ff.msmc.inp
Python version: Python 3.7.3
could you post the error message, not just the Traceback, please?
The error message was the same as in the first message: "IndexError: list index out of range"
Well, if the error occurred with the command chromosome = fields[0]
, it means that you have an empty line there that you're trying to parse. Your input files must be off.
@stschiff
I am using python 3.9. The command line I use and output are as follows.
python generate_multihetsep.py --chr 1 --mask 1Z-CWX10A-1.mask.bed.gz --mask 2QBZ-LFT3-1.mask.bed.gz --mask 3HNHZ-BLM3-1.mask.bed.gz --mask 4YZ-BLM2-1.mask.bed.gz --mask reLG01.mask.bed.gz 1Z-CWX10A-1.vcf.gz 2QBZ-LFT3-1.vcf.gz 3HNHZ-BLM3-1.vcf.gz 4YZ-BLM2-1.vcf.gz > LG01.multihetsep.txt
generating msmc input file with 8 haplotypes
Traceback (most recent call last):
File "generate_multihetsep.py", line 195, in
I saw @Hjorvik had a similar problem, the difference is that the last line of output returned by my command is "File "generate_multihetsep.py", line 73, in next geno = fields[9][:3]".
What is the cause of this error, and how can I fix it? Looking forward to your reply!
This means that one of your VCFs is not in the right shape. My program expects the genotypes (e.g. 0|1
) in the 10th columns (so index 9), and that doesn't seem to be the case in one line in your input.
Hi! @stschiff
I encountered a similar IndexError but this time about "alleles = [fields[3]]":
Traceback (most recent call last):
File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 202, in
The code I am applying is:
${TOOLdir}generate_multihetsep.py --chr ${LG}
--mask=${INDVMASKdir}${CHILD1}${LG}.bed.gz
--mask=${INDVMASKdir}${DAD1}${LG}.bed.gz
--mask=${INDVMASKdir}${MOM1}${LG}.bed.gz
--mask=${INDVMASKdir}${CHILD2}${LG}.bed.gz
--mask=${INDVMASKdir}${DAD2}${LG}.bed.gz
--mask=${INDVMASKdir}${MOM2}${LG}.bed.gz
--trio 1,2,3
--trio 4,5,6
--mask= ${MAPMASKdir}V7_${LG}.mask.bed.gz
${VCFdir}${CHILD1}${LG}.vcf.gz ${VCFdir}${DAD1}${LG}.vcf.gz ${VCFdir}${MOM1}${LG}.vcf.gz
${VCFdir}${CHILD2}${LG}.vcf.gz ${VCFdir}${DAD2}${LG}.vcf.gz ${VCFdir}${MOM2}${LG}.vcf.gz
> $OUTDIR/${CHILD1}.${LG}.multihetsep.txt
Would you please suggest why this issue potentially occurs? Thanks and look forward to your reply!
As above, please check that your VCF file is the shape that my scripts expects it. See my previous comment.
Hi! @stschiff
Thank you so much for your quick response.
I would like to provide additional information regarding the issue I'm facing. Initially, I thought that including the trio information in the script would eliminate the need to phase the VCF files. However, as a troubleshooting step, I decided to phase the VCF files anyway. The heterozygotes are consistently represented as '0|1' or '1|0' after phasing. Unfortunately, despite this effort, I am still encountering the same error.
I would greatly appreciate any further suggestions or guidance you can offer to help resolve this issue. Thank you for your attention.
Hi @stschiff
After some investigation, I find it's actually the problem with the output file from SNPable. It works when I remove the mapability masks. What do you think the potential impact of removing this mask to my final results? Thanks!
Could it just be that you use --trio 1,2,3 --trio 4,5,6
when it should actually be --trio 0,1,2 --trio 3,4,5
? My indices are all 0-based, not 1-based.