Liftoff
Liftoff copied to clipboard
Parent/child features are broken in v1.6.0; and CDS phase is not transferred
Thanks for the useful tool! I'm running into the following problem:
Error
- Parent features are being suffixed with _0 -- which is breaking parent-child relationships
- CDS phase is not being transferred
Liftoff version
$ liftoff --version
v1.6.0
Example
This is the reference annotation:
chr7 funannotate gene 1722312 1724477 . + . ID=FUN_009367;Name=SAC6;Alias=fimA,fim;
chr7 funannotate mRNA 1722312 1724477 . + . ID=FUN_009367-T1;Parent=FUN_009367;product=Fimbrin, actin-bundling protein;Alias=fimA,fim;Ontology_term=GO:0005515,GO:0051015,GO:0051017;Dbxref=InterPro:IPR001715,InterPro:IPR011992,InterPro:IPR018247,InterPro:IPR036872,InterPro:IPR001589,PFAM:PF00307,InterPro:IPR039959;note=COG:Z,EggNog:ENOG410PGNK,BUSCO:EOG0926140Q;
chr7 funannotate exon 1722312 1722334 . + . ID=FUN_009367-T1.exon1;Parent=FUN_009367-T1;
chr7 funannotate exon 1722386 1722596 . + . ID=FUN_009367-T1.exon2;Parent=FUN_009367-T1;
chr7 funannotate exon 1722646 1723328 . + . ID=FUN_009367-T1.exon3;Parent=FUN_009367-T1;
chr7 funannotate exon 1723390 1724375 . + . ID=FUN_009367-T1.exon4;Parent=FUN_009367-T1;
chr7 funannotate exon 1724446 1724477 . + . ID=FUN_009367-T1.exon5;Parent=FUN_009367-T1;
chr7 funannotate CDS 1722312 1722334 . + 0 ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7 funannotate CDS 1722386 1722596 . + 1 ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7 funannotate CDS 1722646 1723328 . + 0 ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7 funannotate CDS 1723390 1724375 . + 1 ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7 funannotate CDS 1724446 1724477 . + 2 ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
This gene model gets transferred as below:
Utg12698 Liftoff gene 2379 4544 . + . ID=FUN_009367;Name=SAC6;Alias=fimA,fim;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=FUN_009367_0
Utg12698 Liftoff mRNA 2379 4544 . + . ID=FUN_009367-T1;Parent=FUN_009367_0;product=Fimbrin, actin-bundling protein;Alias=fimA,fim;Ontology_term=GO:0005515,GO:0051015,GO:0051017;Dbxref=InterPro:IPR001715,InterPro:IPR011992,InterPro:IPR018247,InterPro:IPR036872,InterPro:IPR001589,PFAM:PF00307,InterPro:IPR039959;note=COG:Z,EggNog:ENOG410PGNK,BUSCO:EOG0926140Q;;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
Utg12698 Liftoff exon 2379 2401 . + . ID=FUN_009367-T1.exon1;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff exon 2453 2663 . + . ID=FUN_009367-T1.exon2;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff exon 2713 3395 . + . ID=FUN_009367-T1.exon3;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff exon 3457 4442 . + . ID=FUN_009367-T1.exon4;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff exon 4513 4544 . + . ID=FUN_009367-T1.exon5;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff CDS 2379 2401 . + . ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff CDS 2453 2663 . + . ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff CDS 2713 3395 . + . ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff CDS 3457 4442 . + . ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698 Liftoff CDS 4513 4544 . + . ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
The problem is that the Parent= field for the mRNA features gets appended with _0
, which then doesnt match up with the ID= for the gene feature. The same thing happens on the exon/cds features as they now no longer match the mRNA ID= field.
Also, is there a reason the phase is not transferred?
Again -- thanks for the useful tool!
Hi, thanks for bringing this to my attention. The appending of _0 has been fixed in the master branch. Transferring the phase is on my radar to be added as well, but for now there is a solution posted in #67 to add the phase back in after the lift over.
Great thanks for the parent/child fix, I'll give it a whirl.
If it helps, this is how I add the CDS phase looping through the CDS coordinates (assuming proper start here of 0 and properly sorted)
current_phase = 0
for x in cds_features:
current_phase = (current_phase - (int(cds_end) - int(cds_start) + 1)) % 3
if current_phase == 3:
current_phase = 0
@agshumate just confirming that current master the parent/child is fixed. Thank you.