bcbb
bcbb copied to clipboard
CDS entries in GFF3 file are not merged to a CompoundLocation
Given a gene that looks something like this in GFF3 notion:
##gff-version 3
scf_001 maker gene 36837 38790 . + . ID=BN869_G00000007;Name=BN869_G00000007;
scf_001 maker mRNA 36837 38790 . + . ID=BN869_T00000007_1;Parent=BN869_G00000007;Name=BN869_T00000007_1;
scf_001 maker exon 36837 37491 . + . ID=BN869_T00000007_1:exon:0;Parent=BN869_T00000007_1;
scf_001 maker exon 37547 38790 . + . ID=BN869_T00000007_1:exon:1;Parent=BN869_T00000007_1;
scf_001 maker CDS 36837 37491 . + 0 ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;
scf_001 maker CDS 37547 38790 . + 2 ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;
The GFF parser fails to join the two CDSs with the same ID into a single feature with a CompoundLocation. The result of this is that GenBank och EMBL files produced when merging (and flattening) GFF3 annotations get multiple CDSs where the CDS position instead should be a join, eg:
FT CDS join(36837..37491,37547..38790)
@chapmanb is there an easy solution for this? I stumbled over this as well, as I tried to integrate protein sequences to CDS records.
Björn and Mikael; Sorry about leaving this for so long. I've been meaning to tackle it forever. Have you tried using GFFutils:
https://github.com/daler/gffutils
I've been pointing everyone at Ryan's work as it's better and more up to date than this library. The goal has been to merge any missing functionality this library has there. Hopefully it'll handle your case better.
@chapmanb yes I'm developing currently some Galaxy integration for gffutils, but this is lacking the conversion features as far as I know. You can not convert a gff-sqlite to genbank, isn't it?
I think I have some code that does the merge, albeit maybe not in an optimal way. I'll check it and see if it fits to be merged into a suitable place.
Cheers, Mikael
Sent from a crippled computer (a.k.a a phone)
18 jun 2015 kl. 17:40 skrev Brad Chapman <[email protected]mailto:[email protected]>:
Björn and Mikael; Sorry about leaving this for so long. I've been meaning to tackle it forever. Have you tried using GFFutils:
https://github.com/daler/gffutils
I've been pointing everyone at Ryan's work as it's better and more up to date than this library. The goal has been to merge any missing functionality this library has there. Hopefully it'll handle your case better.
Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbb/issues/95#issuecomment-113196433.
@mikdur this would be great!