msmc-tools icon indicating copy to clipboard operation
msmc-tools copied to clipboard

Got "IndexError: list index out of range" in generate_multihetsep.py

Open ymatmt opened this issue 4 years ago • 12 comments

I am trying to use MSMC in the Cat species. But I got an error as follows.

msmc-tools-master/generate_multihetsep.py --chr=${CHR} --mask=${BAM}out_mask_chr${CHR}.vcf.gz ${VCF}${CHR}phased.vcf.gz > ${VCF}${CHR}_multihetsep.txt

generating msmc input file with 2 haplotypes adding mask: cat_msmc_test/bam/ERR2497923_sorted.bam_out_mask_chrA1.vcf.gz Traceback (most recent call last): File "msmc-tools-master/generate_multihetsep.py", line 200, in maskIterators.append(MaskIterator(f)) File "msmc-tools-master/generate_multihetsep.py", line 19, in init self.readLine() File "msmc-tools-master/generate_multihetsep.py", line 29, in readLine self.start = int(fields[1]) + 1 IndexError: list index out of range

I am wondering about one possibility getting the error is Chromosome number in Cat (e.g. A1, A2...). Do you have any ideas to solve it? Thank you!

ymatmt avatar Jun 02 '20 03:06 ymatmt

Are you sure you have a correct mask file here? It seems you're giving it a VCF instead of a BED file for the mask.

stschiff avatar Apr 29 '21 09:04 stschiff

Hi! I'm facing a similar error message but this time while reading the vcf file.

Traceback (most recent call last): File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 195, in joinedVcfIterator = JoinedVcfIterator(args.files, trios) File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 131, in init self.current_lines = [next(v) for v in self.vcfIterators] File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 131, in self.current_lines = [next(v) for v in self.vcfIterators] File "/proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py", line 68, in next chrom = fields[0]

The command I used is this: python3 /proj/snic2018-8-331/private/src/msmc_workflow/msmc-tools-master/generate_multihetsep.py --mask=mask_files/$ind.$chr.bed.gz --mask=mapping_mask/Oar3.1.maskchr${chr}.mask.bed.gz vcf_out/$ind.$chr.vcf.gz > input_singleind/$ind.$chr.ff.msmc.inp

Python version: Python 3.7.3

Hjorvik avatar Mar 08 '22 15:03 Hjorvik

could you post the error message, not just the Traceback, please?

stschiff avatar Mar 20 '22 21:03 stschiff

The error message was the same as in the first message: "IndexError: list index out of range"

Hjorvik avatar Mar 21 '22 08:03 Hjorvik

Well, if the error occurred with the command chromosome = fields[0], it means that you have an empty line there that you're trying to parse. Your input files must be off.

stschiff avatar Mar 25 '22 15:03 stschiff

@stschiff I am using python 3.9. The command line I use and output are as follows. python generate_multihetsep.py --chr 1 --mask 1Z-CWX10A-1.mask.bed.gz --mask 2QBZ-LFT3-1.mask.bed.gz --mask 3HNHZ-BLM3-1.mask.bed.gz --mask 4YZ-BLM2-1.mask.bed.gz --mask reLG01.mask.bed.gz 1Z-CWX10A-1.vcf.gz 2QBZ-LFT3-1.vcf.gz 3HNHZ-BLM3-1.vcf.gz 4YZ-BLM2-1.vcf.gz > LG01.multihetsep.txt generating msmc input file with 8 haplotypes Traceback (most recent call last): File "generate_multihetsep.py", line 195, in joinedVcfIterator = JoinedVcfIterator(args.files, trios) File "generate_multihetsep.py", line 131, in init self.current_lines = [next(v) for v in self.vcfIterators] File "generate_multihetsep.py", line 131, in self.current_lines = [next(v) for v in self.vcfIterators] File "generate_multihetsep.py", line 73, in next geno = fields[9][:3] IndexError: list index out of range

I saw @Hjorvik had a similar problem, the difference is that the last line of output returned by my command is "File "generate_multihetsep.py", line 73, in next geno = fields[9][:3]".

What is the cause of this error, and how can I fix it? Looking forward to your reply!

yangwukaidi avatar May 16 '23 08:05 yangwukaidi

This means that one of your VCFs is not in the right shape. My program expects the genotypes (e.g. 0|1) in the 10th columns (so index 9), and that doesn't seem to be the case in one line in your input.

stschiff avatar May 16 '23 08:05 stschiff

Hi! @stschiff

I encountered a similar IndexError but this time about "alleles = [fields[3]]":

Traceback (most recent call last): File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 202, in joinedVcfIterator = JoinedVcfIterator(args.files, trios, as_phased) File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 132, in init self.current_lines = [next(v) for v in self.vcfIterators] File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 132, in self.current_lines = [next(v) for v in self.vcfIterators] File "/projappl/project_2005832/msmc-tools/generate_multihetsep.py", line 71, in next alleles = [fields[3]] IndexError: list index out of range

The code I am applying is: ${TOOLdir}generate_multihetsep.py --chr ${LG}
--mask=${INDVMASKdir}${CHILD1}${LG}.bed.gz
--mask=${INDVMASKdir}${DAD1}
${LG}.bed.gz
--mask=${INDVMASKdir}${MOM1}${LG}.bed.gz
--mask=${INDVMASKdir}${CHILD2}
${LG}.bed.gz
--mask=${INDVMASKdir}${DAD2}${LG}.bed.gz
--mask=${INDVMASKdir}${MOM2}
${LG}.bed.gz
--trio 1,2,3
--trio 4,5,6
--mask= ${MAPMASKdir}V7_${LG}.mask.bed.gz
${VCFdir}${CHILD1}${LG}.vcf.gz ${VCFdir}${DAD1}${LG}.vcf.gz ${VCFdir}${MOM1}${LG}.vcf.gz
${VCFdir}${CHILD2}
${LG}.vcf.gz ${VCFdir}${DAD2}${LG}.vcf.gz ${VCFdir}${MOM2}${LG}.vcf.gz
> $OUTDIR/${CHILD1}.${LG}.multihetsep.txt

Would you please suggest why this issue potentially occurs? Thanks and look forward to your reply!

zcharlene avatar Sep 17 '23 13:09 zcharlene

As above, please check that your VCF file is the shape that my scripts expects it. See my previous comment.

stschiff avatar Sep 18 '23 10:09 stschiff

Hi! @stschiff

Thank you so much for your quick response.

I would like to provide additional information regarding the issue I'm facing. Initially, I thought that including the trio information in the script would eliminate the need to phase the VCF files. However, as a troubleshooting step, I decided to phase the VCF files anyway. The heterozygotes are consistently represented as '0|1' or '1|0' after phasing. Unfortunately, despite this effort, I am still encountering the same error.

I would greatly appreciate any further suggestions or guidance you can offer to help resolve this issue. Thank you for your attention.

zcharlene avatar Sep 20 '23 09:09 zcharlene

Hi @stschiff

After some investigation, I find it's actually the problem with the output file from SNPable. It works when I remove the mapability masks. What do you think the potential impact of removing this mask to my final results? Thanks!

zcharlene avatar Sep 21 '23 03:09 zcharlene

Could it just be that you use --trio 1,2,3 --trio 4,5,6 when it should actually be --trio 0,1,2 --trio 3,4,5? My indices are all 0-based, not 1-based.

stschiff avatar Sep 26 '23 13:09 stschiff