arriba icon indicating copy to clipboard operation
arriba copied to clipboard

Suppressed Sequences included in RefSeq_viral_genomes_v2.4.0.fa.gz

Open selkamand opened this issue 7 months ago • 1 comments

Hi, thanks so much for your work on this tool!

Just wanted to flag that RefSeq_viral_genomes_v2.4.0.fa.gz includes several sequences that are now 'suppressed' by NCBI. This suggests the sequences were poor quality or potentially contaminated, and thus have potential to negatively affect Arriba viral detection. Would it be possible to remove these sequences from RefSeq_viral_genomes_v2.4.0.fa.gz in future versions?

A complete list of the problematic sequences is below

NC_027359.1_Propionibacterium_phage_PHL082M00-complete_genome
NC_027991.1_Staphylococcus_phage_SA1-complete_genome
NC_029050.1_Salmonella_phage_21-complete_genome
NC_029072.1_Salmonella_phage_19-complete_genome
NC_035203.1_Grapevine_virus_T_isolate_Cho_replicase_ORF1-TGB1_ORF2-TGB2_ORF3-TGB3_ORF4-and_CP_ORF5_genes-complete_cds
NC_023591.1_Mycobacterium_phage_Adler-complete_genome
NC_024711.1_Uncultured_crAssphage-complete_genome
NC_026813.1_Fusarium_graminearum_hypovirus_2_isolate_FgHV2_JS16-complete_genome
NC_002669.1_Lactococcus_prophage_bIL310-complete_genome
NC_002671.1_Lactococcus_prophage_bIL312-complete_genome
NC_002670.1_Lactococcus_prophage_bIL311-complete_genome
NC_001847.1_Bovine_herpesvirus_1-complete_genome
NC_007045.1_Staphylococcus_phage_PT1028-complete_genome
NC_041920.1_UNVERIFIED_Escherichia_phage_HP3-complete_genome
NC_042059.1_Halobacterium_phage_phiH_T4-T4-and_T_down_LX1_down_genes-complete_sequence_and_orf75_T_down_LX3_down_gene-complete_cds
NC_043055.1_Caprine_herpesvirus_1_strain_E_CH_glycoprotein_B_gene-complete_cds
NC_043057.1_Cervid_herpesvirus_2_strain_Salla_82_glycoprotein_E_US8_gene-partial_cds
NC_043229.1_Johnston_Atoll_virus_isolate_LBJ_polymerase_PB1_PB1_gene-complete_cds
NC_043230.1_Johnston_Atoll_virus_isolate_LBJ_hemagglutinin_HA_gene-complete_cds

selkamand avatar Nov 13 '23 04:11 selkamand