bcbb icon indicating copy to clipboard operation
bcbb copied to clipboard

CDS entries in GFF3 file are not merged to a CompoundLocation

Open mikdur opened this issue 10 years ago • 5 comments

Given a gene that looks something like this in GFF3 notion:

##gff-version 3
scf_001 maker   gene    36837   38790   .       +       .       ID=BN869_G00000007;Name=BN869_G00000007;
scf_001 maker   mRNA    36837   38790   .       +       .       ID=BN869_T00000007_1;Parent=BN869_G00000007;Name=BN869_T00000007_1;
scf_001 maker   exon    36837   37491   .       +       .       ID=BN869_T00000007_1:exon:0;Parent=BN869_T00000007_1;
scf_001 maker   exon    37547   38790   .       +       .       ID=BN869_T00000007_1:exon:1;Parent=BN869_T00000007_1;
scf_001 maker   CDS     36837   37491   .       +       0       ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;
scf_001 maker   CDS     37547   38790   .       +       2       ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;

The GFF parser fails to join the two CDSs with the same ID into a single feature with a CompoundLocation. The result of this is that GenBank och EMBL files produced when merging (and flattening) GFF3 annotations get multiple CDSs where the CDS position instead should be a join, eg:

FT   CDS             join(36837..37491,37547..38790)

mikdur avatar Jan 14 '15 21:01 mikdur

@chapmanb is there an easy solution for this? I stumbled over this as well, as I tried to integrate protein sequences to CDS records.

bgruening avatar Jun 16 '15 18:06 bgruening

Björn and Mikael; Sorry about leaving this for so long. I've been meaning to tackle it forever. Have you tried using GFFutils:

https://github.com/daler/gffutils

I've been pointing everyone at Ryan's work as it's better and more up to date than this library. The goal has been to merge any missing functionality this library has there. Hopefully it'll handle your case better.

chapmanb avatar Jun 18 '15 15:06 chapmanb

@chapmanb yes I'm developing currently some Galaxy integration for gffutils, but this is lacking the conversion features as far as I know. You can not convert a gff-sqlite to genbank, isn't it?

bgruening avatar Jun 18 '15 15:06 bgruening

I think I have some code that does the merge, albeit maybe not in an optimal way. I'll check it and see if it fits to be merged into a suitable place.

Cheers, Mikael


Sent from a crippled computer (a.k.a a phone)

18 jun 2015 kl. 17:40 skrev Brad Chapman <[email protected]mailto:[email protected]>:

Björn and Mikael; Sorry about leaving this for so long. I've been meaning to tackle it forever. Have you tried using GFFutils:

https://github.com/daler/gffutils

I've been pointing everyone at Ryan's work as it's better and more up to date than this library. The goal has been to merge any missing functionality this library has there. Hopefully it'll handle your case better.

Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbb/issues/95#issuecomment-113196433.

mikdur avatar Jun 18 '15 15:06 mikdur

@mikdur this would be great!

bgruening avatar Jun 18 '15 16:06 bgruening