krakenuniq
krakenuniq copied to clipboard
A very long runnig time for krakenuniq
Hello,
We want to run krakenuniq for datasets containing 100-200 samples of about 25-30M reads each.
As of now, we are not able to successfully complete one run, even when trying to use only one of the paired end reads or even when taking only 100, 000 reads from a sample.
It seems as of it hardly uses any CPUs.
An example run- nohup ./krakenuniq --db DBDIR-microbialDB --threads 40 --report-file REPORT_FILE_smallfastq_100000 ../kneaddata_output/k neaddata_output_3/SRR233_1_kneaddata.trimmed.1.100000.fastq --preload > smallfastqfile-100000.out 2>&1 &
We are using krakenuniq 1.0.4 version. The database is NCBI nt.
Thank you! Sheerli
you don't have enough RAM - I don't know how you built a KrakenUniq database for NCBI nt, but that's really enormous. (The Kraken2 DB is smaller but still very large.) If you downloaded the DB from our index page, then it's not NT but it still needs over 400 GB of RAM. Try running without "--preload" which tells Kraken to load the entire DB before processing any reads (no matter how small the file). That sometimes fixes it quickly. Alternatively, use "--preload-size N" where N is less than half of the RAM you have available. That will do it too.
Thank you for your quick response Steven!
Our machine uses 190 GBRAM, we use Amazon EC2 c5d.24xlarge We are using the indexed DB that you have published. We tried a preload of 90 GBRAM as you suggested, and 60 threads. The running time has substantially gone down to about 1 hour and 20 minutes! (for 1 sample)
We would like to know if there is another way to reduce the running time of one metagenomic sample as we have about 200 samples.
Additionally, the "--preload-size N" function you suggested isn't available in the help command. Is there another manual besides what is in the GitHub?
Thank you! Sheerli
@salzberg @alekseyzimin we made a KrakenUniq NCBI NT database publicly available here https://doi.org/10.17044/scilifelab.20205504 together with our aMeta ancient metagenomic tool publication https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03083-9. Perhaps you could find it interesting. In the ancient microbial metagenomic and ancient environmental metagenomic filed we often need to profile our data against very large databases that consume a few TB of RAM, so your recent KrakenUniq development that allowed database chunking was very useful for us, thank you very much! The database I mentioned was built on a node with 4 TB of RAM using 80 threads, and the building took approximately two weeks.
Thank you very much for letting us know! The new database will be very useful for data sets where species composition is unknown.
On Wed, Aug 28, 2024 at 1:02 AM Nikolay Oskolkov @.***> wrote:
@salzberg https://github.com/salzberg @alekseyzimin https://github.com/alekseyzimin we made a KrakenUniq NCBI NT database publicly available here https://doi.org/10.17044/scilifelab.20205504 together with our aMeta ancient metagenomic tool publication https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03083-9. Perhaps you could find it interesting. In the ancient microbial metagenomic and ancient environmental metagenomic filed we often need to profile our data against very large databases that consume a few TB of RAM, so your recent KrakenUniq development that allowed database chunking was very useful for us, thank you very much! The database I mentioned was built on a node with 4 TB of RAM using 80 threads, and the building took approximately two weeks.
— Reply to this email directly, view it on GitHub https://github.com/fbreitwieser/krakenuniq/issues/158#issuecomment-2314320163, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHNH3XCNP4CA3FJSWDLZTVKVBAVCNFSM6AAAAABNHOMOLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJUGMZDAMJWGM . You are receiving this because you were mentioned.Message ID: @.***>
-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com