CLEAR icon indicating copy to clipboard operation
CLEAR copied to clipboard

exonFrames field is being added, -genePredExt but no valid frames

Open sharmi85 opened this issue 4 years ago • 6 comments

Hello. I am trying to use CLEAR for my data set and running the following command: clear_quant -1 /userdata/sharmishtha/Hela/trimmedFastqFiles/trim_HeLa-AMT-1_R1.fastq.gz -2 /userdata/sharmishtha/Hela/trimmedFastqFiles/trim_HeLa-AMT-1_R2.fastq.gz -g /userdata/sharmishtha/ref_and_anno/hg38/hg38.fa -i /userdata/sharmishtha/IndexFiles/hg38/hisat2index/hg38_hisat2_index -j /userdata/sharmishtha/IndexFiles/hg38/bowtie1_index/bowtie1_index -G /userdata/sharmishtha/IndexFiles/hg38/hg38_kg.gtf -o HelaAMT1_output_dir

The steps untill tophat fusion worked, but got an error after Tophat fusion: ###Start circRNA annotation Error: exonFrames field is being added, but I found a gene (ENST00000602051.5) with CDS but no valid frames. This can happen if program is invoked with -genePredExt but no valid frames are given in the file. If the 8th field of GFF/GTF file is always a placeholder, then don't use -genePredExt. Traceback (most recent call last): File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/bin/clear_quant", line 11, in load_entry_point('CLEAR==1.0.0', 'console_scripts', 'clear_quant')() File "build/bdist.linux-x86_64/egg/src/run.py", line 262, in main File "build/bdist.linux-x86_64/egg/src/run.py", line 173, in circ_annot File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['gtfToGenePred', '-genePredExt', '/userdata/sharmishtha/IndexFiles/hg38/hg38_kg.gtf', 'HelaAMT1_output_dir/circ/genePred.tmp']' returned non-zero exit status 255

I used te Circ explorer2 command to get the gtf file: cut -f2-11 hg38_ref.txt|genePredToGtf file stdin hg38_ref.gtf

So I dont know whats going on. Why is the gtf file is giving the error. kindly help

sharmi85 avatar Dec 28 '20 06:12 sharmi85

I tried a differnt file but doesnt work... Did you use th eknown genes file for the annotation? What command did you use to download the known genes txt file? and also what command did you use to convert to the gtf format? I used the command listed out in CIrc Explorer2 pipeline.

Download human reference genome sequence file:28th Dec 2020 fetch_ucsc.py hg38 kg hg38_kg.txt

Convert gene annotation file to GTF format (require genePredToGtf) converted on 28th Dec 2020

cut -f2-11 hg38_kg.txt|genePredToGtf file stdin hg38_kg.txt

Please help... I am stuck in the circRNA annotation

Thanks Sharmi

sharmi85 avatar Dec 29 '20 06:12 sharmi85

I think the command below will solve this problem:

perl -alne '$,="\t";print (@F[1..@F-1], 0, $F[0])' hg38_kg.txt | genePredToGtf file stdin hg38_kg.gtf

The hg38_kg.gtf file is the needed file for clear_quant.

xingma avatar Dec 30 '20 01:12 xingma

Thank you so much This should help.. will try and update you back. Thanks Sharmi

On Wed, Dec 30, 2020 at 6:32 AM xingma [email protected] wrote:

I think the command below will solve this problem:

perl -alne '$,="\t";print (@F[1..@F-1], 0, $F[0])' hg38_kg.txt | genePredToGtf file stdin hg38_kg.gtf

The hg38_kg.gtf file is the needed file for clear_quant.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YangLab/CLEAR/issues/13#issuecomment-752291721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASJAFPZB2EKKQ5E2RHJTMLLSXJ33JANCNFSM4VLTEYBA .

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India

sharmi85 avatar Dec 30 '20 06:12 sharmi85

Nope didn't work gave me the same error

###Start circRNA annotation Error: exonFrames field is being added, but I found a gene (ENSMUST00000221646.1) with CDS but no valid frames. This can happen if program is invoked with -genePredExt but no valid frames are given in the file. If the 8th field of GFF/GTF file is always a placeholder, then don't use -genePredExt. Traceback (most recent call last): File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/bin/clear_quant", line 11, in load_entry_point('CLEAR==1.0.0', 'console_scripts', 'clear_quant')() File "build/bdist.linux-x86_64/egg/src/run.py", line 262, in main File "build/bdist.linux-x86_64/egg/src/run.py", line 173, in circ_annot File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['gtfToGenePred', '-genePredExt', '/userdata/sharmishtha/ref_and_anno/mm10_20/mm10_kg.gtf', '66Old_output_dir/circ/genePred.tmp']' returned non-zero exit status 255

On Wed, Dec 30, 2020 at 12:21 PM Sharmishtha Shyamal < [email protected]> wrote:

Thank you so much This should help.. will try and update you back. Thanks Sharmi

On Wed, Dec 30, 2020 at 6:32 AM xingma [email protected] wrote:

I think the command below will solve this problem:

perl -alne '$,="\t";print (@F[1..@F-1], 0, $F[0])' hg38_kg.txt | genePredToGtf file stdin hg38_kg.gtf

The hg38_kg.gtf file is the needed file for clear_quant.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YangLab/CLEAR/issues/13#issuecomment-752291721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASJAFPZB2EKKQ5E2RHJTMLLSXJ33JANCNFSM4VLTEYBA .

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India

sharmi85 avatar Dec 30 '20 09:12 sharmi85

Hi, this problem is caused by few transcript annotations with strange start codon and stop codon position. I have updated CLEAR to 1.0.1 to solve this problem. Thanks.

xingma avatar Dec 31 '20 08:12 xingma

Thank you for your response. will update the version and try

On Thu, Dec 31, 2020 at 2:13 PM xingma [email protected] wrote:

Hi, this problem is caused by few transcript annotations with strange start codon and stop codon position. I have updated CLEAR to 1.0.1 to solve this problem. Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YangLab/CLEAR/issues/13#issuecomment-752891500, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASJAFP6CV6NR2BULEJXKVJTSXQ2UZANCNFSM4VLTEYBA .

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India

sharmi85 avatar Dec 31 '20 08:12 sharmi85