prokka icon indicating copy to clipboard operation
prokka copied to clipboard

Retrieving additional uniprot codes

Open gundizalv opened this issue 3 years ago • 0 comments

Hi dear

Looking at your prokka-uniprot_to_fasta_db script, I wonder if I can retrieve KEGG accession genes instead of COG. There are more than 20 millon entries in trembl with linked KEGG genes while barely a half for COG.

This is your piece of code:

my $ec = ''; 
  my $prod = ''; 
  my $cog = '';

  if (1) {
    # [ 'eggNOG', 'COG4799', 'LUCA' ]
    for my $dr ( @{ $entry->DRs->list } ) {
#      print Dumper($dr);
      if ($dr->[1] =~ m/^(COG\d+)$/) {
        $cog = $1;
        last;
      }   
    }
  }

... Instead of matching the DR line 'eggNOG', 'COG...', 'bacteria' and retrieving COG accession, Is there any possibility to match the DR line 'KEGG'; 'vg:2947773'; - (example). and pick up the KEGG code? What I should modify? They are too useful to me.

gundizalv avatar May 17 '21 11:05 gundizalv