Torsten Seemann

Results 326 comments of Torsten Seemann

@sagarutturkar thanks for the tip about PLSDB!

Turns out their are 1.1 million unique proteins in all refseq plasmids. Clustered down to about 250,000. That's way bigger than the 22,000 core chromosomal DB i am using!

That's after I excluded hypotheticals. BUT It turns out that those stats are all wrong, and include lots of chromosomes. WHY? https://ftp.ncbi.nlm.nih.gov/refseq/release/plasmid/ has all the CDS of chromosomes in it...

I need a database of non-redudant **plasmid-specific proteins** and corresponding `/gene`, `/EC_number` (and `/COG` if possible)

Thanks for the report! It's strange - the merR CDS does not have a protein ID. It's because it's a pseudo-gene, but it is not using the `/psuedo` tag but...

Looks like it is new: http://www.insdc.org/documents/feature_table.html The confusion is that it is missing the `/pseudo` tag to go along with it? ``` Qualifier /pseudo Definition indicates that this feature is...

For now you can edit the GBK file and change `/pseudogene="unknown"` to `/pseudo`

If you only have one contig then you can set the `--locustag` parameter to be the contig name yourself (and don't use `--compliant` which renames contigs). If you have lots...

@SilentGene given your github username i would have thought you would want to keep your gene source private! ;-) I think the best way to solve this would be to...

I will consider it, but Prokka is not really designed for metagenomes.