bioperl-live icon indicating copy to clipboard operation
bioperl-live copied to clipboard

Bio::Tools::GFF _gffX_string update

Open Juke34 opened this issue 4 years ago • 1 comments

Hi, I would like to push some updates about the methods _gff2_string and _gff25_string to remove some inconsistency related to the format specifications (Here a review of the specifications I have done). Currently the difference between the two methods lies in the fact that Target attribute are put first in the attribute list using _gff25_string.

point 1) As the order shouldn't matter I was wondering if we could remove the attribute sorting. The code is quite old (2004). I'm sceptical due to a comment saying # need to put the target info before other tag/value pairs - mw, and because the description of the _gff25_string method says: Function: To get a format of GFF that is peculiar to Gbrowse/Bio::DB::GFF. But why having a general method handling a specific case for Gbrowse then? I guess Gbrowse has fixed this peculiarity since then...

Both are giving attribute list like that (note the two spaces before the semicolon, one would be enough...):

tag1 "value 1" ; tag2 value2

The _gff2_string method follows the GFF2 specification. About the attribute the specification says: From version 2 onwards, the attribute field must have an tag value structure following the syntax used within objects in a .ace file, flattened onto one line by semicolon separators. **point 2) They do not ask to put spaces around the semicolon, should we remove them? **. I guess for avoiding potential compatibility issue it's easier to keep it like that...

The _gff25_string is similar to _gff2_string but should follow the GTF2 format. (GFF2.5 = GTF). In that sense, the attribute must looks like:

tag1 "value 1"; tag2 value2;

point 3) For me is the most important point, the _gff25_string method must create GTF2/GFF2.5 format and not do be a fix of the _gff2_string method to be adapted for peculiar GBrowse case.

I poke @fangly @bosborne @hyphaltip @cjfields because I have seen you have worked on that package at some point.

I will adapt my modifications according to your feedback. Best regards,

Jacques

Juke34 avatar Nov 06 '19 16:11 Juke34

@Juke34 Based on the documentation I think it would be good to have you involved with the GFF specification discussions, though those have gone a bit dormant in the last few years.

I'm all for updating to ensure the specifications are in place. @scottcain would you have any comments on the above, as it could affect GBrowse? Maybe it doesn't matter if everyone is moving to using JBrowse and/or Bio::DB::SeqFeature?

cjfields avatar Nov 06 '19 16:11 cjfields