aws-indexes
aws-indexes copied to clipboard
human herpesvirus 2 missing from database
Human herpesvirus 2 sequence is absent from the Viral Kraken Database (and all the derivate Databases). It's strange because:
- all other non-human herpesvirus 2 sequences are present in the Viral Kraken Database (see code below)
- Human herpesvirus 2 sequence "NC_001798.2" is present in the Viral NCBI RefSeq Database (see code below)
- Human herpesvirus 2 is quite an important human virus, I dare to say it MUST be present in the Kraken database.
Is there a specific reason why it is missing? Here some simple code for reproducibility:
# get krakendb viral taxa
$ wget https://genome-idx.s3.amazonaws.com/kraken/viral_20231009/library_report.tsv
$ grep "herpesvirus 2" library_report.tsv | cut -f 2 | sort > krakendb.txt
# get RefSeq viral taxas
$ wget https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.1.genomic.fna.gz
$ zgrep "^>" viral.1.1.genomic.fna.gz | grep "herpesvirus 2" | sort > viralgenomic.txt
# find differences
$ diff -y krakendb.txt viralgenomic.txt
>NC_001350.1 Saimiriine herpesvirus 2 complete genome >NC_001350.1 Saimiriine herpesvirus 2 complete genome
>NC_001650.2 Equid herpesvirus 2 strain 86/67, complete genom >NC_001650.2 Equid herpesvirus 2 strain 86/67, complete genom
> >NC_001798.2 Human herpesvirus 2 strain HG52, complete genome
>NC_002229.3 Gallid herpesvirus 2, complete genome >NC_002229.3 Gallid herpesvirus 2, complete genome
>NC_003521.1 Panine herpesvirus 2 strain Heberling, complete >NC_003521.1 Panine herpesvirus 2 strain Heberling, complete
>NC_006560.1 Cercopithecine herpesvirus 2, complete genome >NC_006560.1 Cercopithecine herpesvirus 2, complete genome
>NC_007646.1 Ovine herpesvirus 2 strain BJ1035, complete geno >NC_007646.1 Ovine herpesvirus 2 strain BJ1035, complete geno
>NC_007653.1 Papiine herpesvirus 2, complete genome >NC_007653.1 Papiine herpesvirus 2, complete genome
>NC_008210.1 Ranid herpesvirus 2 strain ATCC VR-568, complete >NC_008210.1 Ranid herpesvirus 2 strain ATCC VR-568, complete
>NC_019495.1 Cyprinid herpesvirus 2 strain ST-J1, complete ge >NC_019495.1 Cyprinid herpesvirus 2 strain ST-J1, complete ge
>NC_020231.1 Caviid herpesvirus 2 strain 21222, complete geno >NC_020231.1 Caviid herpesvirus 2 strain 21222, complete geno
>NC_024382.1 Alcelaphine herpesvirus 2 isolate topi-AlHV-2, c >NC_024382.1 Alcelaphine herpesvirus 2 isolate topi-AlHV-2, c
>NC_036579.1 Ictalurid herpesvirus 2 strain 760/94, complete >NC_036579.1 Ictalurid herpesvirus 2 strain 760/94, complete
>NC_038265.1 Porcine lymphotropic herpesvirus 2 isolate 568 l >NC_038265.1 Porcine lymphotropic herpesvirus 2 isolate 568 l
>NC_038860.1 Pongine herpesvirus 2 (Orangutan herpesvirus) gB >NC_038860.1 Pongine herpesvirus 2 (Orangutan herpesvirus) gB
>NC_043042.1 Acipenserid herpesvirus 2 strain SRWSHV, partial >NC_043042.1 Acipenserid herpesvirus 2 strain SRWSHV, partial
>NC_043044.1 Salmonid herpesvirus 2 isolate NeVTA ORF68-like >NC_043044.1 Salmonid herpesvirus 2 isolate NeVTA ORF68-like
>NC_043059.1 Caprine herpesvirus 2 glycoprotein B (gB) and DN >NC_043059.1 Caprine herpesvirus 2 glycoprotein B (gB) and DN
>NC_043062.1 Phocid herpesvirus 2 DNA-dependent DNA polymeras >NC_043062.1 Phocid herpesvirus 2 DNA-dependent DNA polymeras
>NC_043063.1 Iguanid herpesvirus 2 DNA-dependent DNA polymera >NC_043063.1 Iguanid herpesvirus 2 DNA-dependent DNA polymera
>NC_075563.1 Cervid alphaherpesvirus 2 strain Norway, complet >NC_075563.1 Cervid alphaherpesvirus 2 strain Norway, complet
>NC_075802.1 Salmonid herpesvirus 2 isolate NeVTA DNA polymer >NC_075802.1 Salmonid herpesvirus 2 isolate NeVTA DNA polymer
>NC_076512.1 Bovine alphaherpesvirus 2 strain C1Z FZR, comple >NC_076512.1 Bovine alphaherpesvirus 2 strain C1Z FZR, comple
>NC_076513.1 Macropodid alphaherpesvirus 2 strain V3077/08, c >NC_076513.1 Macropodid alphaherpesvirus 2 strain V3077/08, c
>NC_076966.1 Cacatuid alphaherpesvirus 2 isolate CaHV2/Melbou >NC_076966.1 Cacatuid alphaherpesvirus 2 isolate CaHV2/Melbou