spades
spades copied to clipboard
Looking for PlasmidDatabase and non-Plasmid Contigs dataset
Hi there,
I'm looking for the two datasets named in the title. This link - http://data.cab.spbu.ru/index.php/s/tz7mCqDipgbcsbW - shown in the paper - https://genome.cshlp.org/content/29/6/961.full, results in a file not found error.
Hope all is well, Thanks
Assigning the first author of metaplasmidSPAdes paper. He will certainly be able to help you.
Hi. Thank you for noticing that the link is down, we'll fix it.
However we did not upload these databases - by this link we shared only metaplasmidSPAdes' results and not contigs used for plasmidVerify training/testing.
The construction of these databases is described in the text (PlasmidDatabase data set containing all 9937 plasmids from the RefSeq database (total length 1007 Mb) and the nonPlasmidDatabase data set containing a randomly selected 10% of complete bacterial chromosomes from RefSeq (837 bacterial genomes with total length 3229 Mb).), then they were randomly splitted (70/30) both bases to training/testing datasets and nonPlasmid testing ones were additionally splitted to 10kb chunks for better representation of real fragmented assemblies.
We can share the exact files we used, but if you want to retrain plasmidVerify I'd recommend to use more data - RefSeq plasmid DB was extended since we started the development