Liftoff icon indicating copy to clipboard operation
Liftoff copied to clipboard

Parent/child features are broken in v1.6.0; and CDS phase is not transferred

Open nextgenusfs opened this issue 3 years ago • 3 comments

Thanks for the useful tool! I'm running into the following problem:

Error

  1. Parent features are being suffixed with _0 -- which is breaking parent-child relationships
  2. CDS phase is not being transferred

Liftoff version

$ liftoff --version
v1.6.0

Example

This is the reference annotation:

chr7    funannotate     gene    1722312 1724477 .       +       .       ID=FUN_009367;Name=SAC6;Alias=fimA,fim;
chr7    funannotate     mRNA    1722312 1724477 .       +       .       ID=FUN_009367-T1;Parent=FUN_009367;product=Fimbrin, actin-bundling protein;Alias=fimA,fim;Ontology_term=GO:0005515,GO:0051015,GO:0051017;Dbxref=InterPro:IPR001715,InterPro:IPR011992,InterPro:IPR018247,InterPro:IPR036872,InterPro:IPR001589,PFAM:PF00307,InterPro:IPR039959;note=COG:Z,EggNog:ENOG410PGNK,BUSCO:EOG0926140Q;
chr7    funannotate     exon    1722312 1722334 .       +       .       ID=FUN_009367-T1.exon1;Parent=FUN_009367-T1;
chr7    funannotate     exon    1722386 1722596 .       +       .       ID=FUN_009367-T1.exon2;Parent=FUN_009367-T1;
chr7    funannotate     exon    1722646 1723328 .       +       .       ID=FUN_009367-T1.exon3;Parent=FUN_009367-T1;
chr7    funannotate     exon    1723390 1724375 .       +       .       ID=FUN_009367-T1.exon4;Parent=FUN_009367-T1;
chr7    funannotate     exon    1724446 1724477 .       +       .       ID=FUN_009367-T1.exon5;Parent=FUN_009367-T1;
chr7    funannotate     CDS     1722312 1722334 .       +       0       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7    funannotate     CDS     1722386 1722596 .       +       1       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7    funannotate     CDS     1722646 1723328 .       +       0       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7    funannotate     CDS     1723390 1724375 .       +       1       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;
chr7    funannotate     CDS     1724446 1724477 .       +       2       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1;

This gene model gets transferred as below:

Utg12698       Liftoff gene    2379    4544    .       +       .       ID=FUN_009367;Name=SAC6;Alias=fimA,fim;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=FUN_009367_0
Utg12698       Liftoff mRNA    2379    4544    .       +       .       ID=FUN_009367-T1;Parent=FUN_009367_0;product=Fimbrin, actin-bundling protein;Alias=fimA,fim;Ontology_term=GO:0005515,GO:0051015,GO:0051017;Dbxref=InterPro:IPR001715,InterPro:IPR011992,InterPro:IPR018247,InterPro:IPR036872,InterPro:IPR001589,PFAM:PF00307,InterPro:IPR039959;note=COG:Z,EggNog:ENOG410PGNK,BUSCO:EOG0926140Q;;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
Utg12698       Liftoff exon    2379    2401    .       +       .       ID=FUN_009367-T1.exon1;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff exon    2453    2663    .       +       .       ID=FUN_009367-T1.exon2;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff exon    2713    3395    .       +       .       ID=FUN_009367-T1.exon3;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff exon    3457    4442    .       +       .       ID=FUN_009367-T1.exon4;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff exon    4513    4544    .       +       .       ID=FUN_009367-T1.exon5;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff CDS     2379    2401    .       +       .       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff CDS     2453    2663    .       +       .       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff CDS     2713    3395    .       +       .       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff CDS     3457    4442    .       +       .       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0
Utg12698       Liftoff CDS     4513    4544    .       +       .       ID=FUN_009367-T1.cds;Parent=FUN_009367-T1_0;extra_copy_number=0

The problem is that the Parent= field for the mRNA features gets appended with _0, which then doesnt match up with the ID= for the gene feature. The same thing happens on the exon/cds features as they now no longer match the mRNA ID= field.

Also, is there a reason the phase is not transferred?

Again -- thanks for the useful tool!

nextgenusfs avatar Apr 08 '21 01:04 nextgenusfs

Hi, thanks for bringing this to my attention. The appending of _0 has been fixed in the master branch. Transferring the phase is on my radar to be added as well, but for now there is a solution posted in #67 to add the phase back in after the lift over.

agshumate avatar Apr 08 '21 14:04 agshumate

Great thanks for the parent/child fix, I'll give it a whirl.

If it helps, this is how I add the CDS phase looping through the CDS coordinates (assuming proper start here of 0 and properly sorted)

  current_phase = 0
  for x in cds_features:
      current_phase = (current_phase - (int(cds_end) - int(cds_start) + 1)) % 3
      if current_phase == 3:
          current_phase = 0

nextgenusfs avatar Apr 08 '21 15:04 nextgenusfs

@agshumate just confirming that current master the parent/child is fixed. Thank you.

nextgenusfs avatar Apr 08 '21 16:04 nextgenusfs