scikit-ribo icon indicating copy to clipboard operation
scikit-ribo copied to clipboard

Inconsistence Gene IDs used in gtf_preprocess.py

Open catsargent opened this issue 6 years ago • 0 comments

Whilst using gtf_preprocess.py to create the expandCDS.fasta file, I obtained the following error:

Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/scikit-ribo/gtf_preprocess.py", line 280, in worker.getSeq() File "/usr/local/lib/python3.5/dist-packages/scikit-ribo/gtf_preprocess.py", line 154, in getSeq self.fiveUtrDic[geneName] + self.fastaDic[geneName] + self.threeUtrDic[geneName] + "\n") KeyError: 'ENSG00000230989'

This appears to be because in the 3utr.fasta, 5tr.fasta and cds.fasta files that were created have, for example, the following as a header:

ENSG00000187961::1:960586-965715(+)

Whereas the variable self.geneNames stores the IDs as only e.g. ENSG00000187961

Given the previous issue that I raised and solved, please can you confirm whether there is a problem in the code that is giving rise to this error?

Many thanks, Catherine

catsargent avatar Apr 09 '18 13:04 catsargent