fusioncatcher icon indicating copy to clipboard operation
fusioncatcher copied to clipboard

fusioncatcher-build fails for canis lupus familiaris, with IndexError: list index out of range

Open jowkar opened this issue 3 years ago • 7 comments

This issue is similar to #82 from 2018, except it could not be solved by changing servers, so the root cause might be something different this time. The latest version of FusionCatcher was installed with the recommended method. The error message is the following (the full log file is attached):

Traceback (most recent call last): File "/home/joakim/bin/fusioncatcher_24_02_2021/fusioncatcher/bin/add_custom_gene.py", line 288, in head = database[1] IndexError: list index out of range

stdout.txt

jowkar avatar Feb 24 '21 06:02 jowkar

Hi jowkar,

what version of FusionCatcher is there?

Cheers, Daniel

ndaniel avatar Feb 24 '21 07:02 ndaniel

v1.33

jowkar avatar Feb 24 '21 08:02 jowkar

I am trying to reproduce the bug and let's see. At first glance it looks like Ensembl has changed the organism name from canis_familiaris to canis_lupus_familiaris.

ndaniel avatar Feb 24 '21 08:02 ndaniel

Yes, they have changed the name. The script does download some files, but some of them, such as exons.txt end up as empty files. On line 285-288 in add_custom_gene.py, the script then tries to read from this file (exons.txt) and gets nothing, resulting in the error, I think.

jowkar avatar Feb 24 '21 08:02 jowkar

Yes, indeed I can reproduce the bug and it is related to the change from canis_familiaris to canis_lupus_familiaris in Ensembl. Several scripts need to be modified. Soon I will push the changes here in Github but I will not release a new official version of FusionCatcher yet.

Shortly, these two lines

    ense = options.organism.lower().split('_',1)
    ensembl_organism = ense[0][0]+ense[1]+'_gene_ensembl'

should be replaced with these two lines

    ense = options.organism.lower().split('_')
    ensembl_organism = ense[0][0] + ense[1] + '_gene_ensembl' if len(ense) == 2 else ense[0][0] + ense[1][0] + ense[2] + '_gene_ensembl'

in the following files:

  • get_biotypes.py
  • get_exons_positions.py
  • get_genes_descriptions.py
  • get_hla2.py
  • get_mtrna.py
  • get_paralogs.py
  • get_refseq_ensembl.py
  • get_rrna.py
  • get_trna.py

ndaniel avatar Feb 24 '21 08:02 ndaniel

In get_paralogs.py, it seems the following line needs to be changed as well: #org = ense[0][0] + ense[1] org = ense[0][0] + ense[1] if len(ense) == 2 else ense[0][0] + ense[1][0] + ense[2]

jowkar avatar Feb 24 '21 15:02 jowkar

@jowkar Indeed, that is correct!

It looks like after these fixes there are still more things to fix.

ndaniel avatar Feb 24 '21 18:02 ndaniel