bioperl-live
                                
                                 bioperl-live copied to clipboard
                                
                                    bioperl-live copied to clipboard
                            
                            
                            
                        bp_genbank2gff3.pl parsing issue
Hello, When parsing a .gbf generated by tbl2asn, such as one that is being prepared for submission, the ACCESSION fields for each locus will be empty (not assigned by genbank yet). bp_genbank2gff3.pl assigns the "unknown" value as the region ID for all of the loci.
Contig_1 GenBank region 1 1627 . + 1 ID=unknown;Dbxref=BioProject:###########;Name=unknown;Note=Clostridium sporogenes.,clade I;isolate=2007;mol_type=genomic DNA;organism=Clostridium sporogenes
This creates an issue for downstream parsing as all the nucleotide fasta headers at the bottom of the file are the same (>unknown). Can the script be modified to either number them uniquely or else us the LOCUS value when an ACCESSION is not available?
The second option (using the LOCUS) may be easier to implement. Will have to see if this can be done prior to the next release or not.
Sounds Good.