bio icon indicating copy to clipboard operation
bio copied to clipboard

Run error: panic: start must be less than end

Open TendoLiu opened this issue 5 years ago • 8 comments

bash-4.2$ bio-fusion \

-r1=$R1
-r2=$R2
-transcript=$transcript I1104 12:57:44.433680 11841 main.go:336] Start reading geneDB I1104 12:57:44.556241 11841 gene_db.go:310] Reading transcriptome /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa denovo panic: start must be less than end

goroutine 90 [running]: github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc00049a5c0, 0xc0003d2a20, 0xc0004865a0, 0xc00024b530, 0xc0003ea050) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd created by github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb

TendoLiu avatar Nov 04 '19 17:11 TendoLiu

can you confirm where you have obtained $transcript file thanks

eksyang avatar Nov 04 '19 20:11 eksyang

Thank you so much. Please see my script and the log.

Script: `module load af4 hg19_UCSC=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/ucsc.hg19.fasta/ucsc.hg19.fasta anno=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz

bio-fusion -generate-transcriptome
-output /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa
-keep-mitochondrial-genes
-keep-readthrough-transcripts
-keep-pary-locus-transcripts
-keep-versioned-genes
$anno $hg19_UCSC

R1=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R1.fastq R2=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R2.fastq transcript=/bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa out=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion

bio-fusion
-r1=$R1
-r2=$R2
-umi-in-name
-fasta-output=$out/test.out.fa
-filtered-output=$out/test.filtered.fa
-transcript=$transcript
-output $out ` Log:

`2019/11/04 15:48:41 Creating /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa I1104 15:48:41.464042 15119 parsegencode.go:246] GTF: /bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz I1104 15:48:50.646038 15119 parsegencode.go:248] Read 66738 genes, 247086 transcripts, 1478937 exons I1104 15:49:06.822649 15119 parsegencode.go:358] Successfully parsed 66738 genes, 247086 transcripts and 1478937 exons I1104 15:49:06.822692 15119 parsegencode.go:360] Retained 247086 transcripts and 1478937 exons E1104 15:49:33.213128 15119 parsegencode.go:490] invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520 panic: invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520

goroutine 1 [running]: github.com/grailbio/base/log.Panic(0xc0004c99d8, 0x1, 0x1) /ihome/crc/install/af4/src/github.com/grailbio/base/log/log.go:168 +0xc7 github.com/grailbio/bio/fusion/parsegencode.PrintParsedGTFRecords(0x1042300, 0xc00025ea40, 0x104a020, 0xc07dd83260, 0xc07c132000, 0x104b2, 0x104b2, 0x1010101000000) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/parsegencode/parsegencode.go:490 +0x576 github.com/grailbio/bio/fusion/cmd.GenerateTranscriptome(0x104bc20, 0xc00047d2c0, 0x7fff2027caa0, 0x8f, 0x7fff2027cb30, 0x6b, 0x0, 0x7fff2027c9d8, 0x5c, 0x0, ...) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/generate_transcriptome.go:114 +0x617 github.com/grailbio/bio/fusion/cmd.Run() /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/main.go:558 +0xaa8 main.main() /ihome/crc/install/af4/src/github.com/grailbio/bio/cmd/bio-fusion/main.go:9 +0x20

I1104 15:50:30.178381 15172 main.go:336] Start reading geneDB I1104 15:50:30.256422 15172 gene_db.go:310] Reading transcriptome /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa denovo panic: start must be less than end

goroutine 90 [running]: github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc000496580, 0xc0003c8a20, 0xc000476120, 0xc0002e5350, 0xc0003e0030) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd created by github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb

`

TendoLiu avatar Nov 04 '19 20:11 TendoLiu

It seems that there was a "panic" when generating the transcriptome file, but the file was produced anyway. Note that I am using a newer version of annotation file. And the reference genome is hg19 instead of hg38 in the example code.

TendoLiu avatar Nov 04 '19 20:11 TendoLiu

gencode.v26.whole_genes.txt Just in case you want to look at the transcriptome file. I converted .fa to .txt to be able to upload.

TendoLiu avatar Nov 04 '19 21:11 TendoLiu

Thanks, will look into it

eksyang avatar Nov 04 '19 21:11 eksyang

Any follow-up?

TendoLiu avatar Dec 20 '19 22:12 TendoLiu

sorry, at the moment, I do not have linux machine to debug into it. But after relook into the issue, it appears that the sequence chr20 in the provided fasta file has length 63025520, however, there seems to be a gene in the gtf in the range of 63686370 - 63688050 in "chr20". The only issue I can think of is that the gtf file you used ($anno = gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz) and the reference fasta file ($hg19_UCSC = ucsc.hg19.fasta) are not compatible with each other. possible to confirm that ?

eksyang avatar Dec 21 '19 02:12 eksyang

hi TendoLiu, a colleague offerred to help after holidays. In order to debug into this, could you share or point to the link to obtain the two files gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz and ucsc.hg19.fasta (if you cannot yet confirm the annotation and the fasta file do not match). Sorry for the delay.

eksyang avatar Dec 21 '19 17:12 eksyang