khmer
khmer copied to clipboard
khmer
I got to do digital normalization with metagenome file of size 71.3 Gb, and I got file of 47.7 Gb after that, is it possible?
Shall I post my commands to let you know better and any help in this matter
Not 47.7 Gb sorry typing error but 4.7 Gb
hi @khuch123 this would be reasonable for RNAseq or a not-very-diverse metagenome or a super coverage genome, certainly. If you could post the command (in particular the -M parameter) that would help us take a look. What are you sequencing?
./normalize-by-median.py -k 20 -C 20 -N 4 -x 5e8 -p --savegraph normC20k20.kh final.fq
I tried the -M parameter but it was not working with -x command and I happen to see this command from the protocols of Kalamazoo metagenome assembly protocols
Ahh, yes. OK, I think the main problem I see is that you are using a very
low amount of memory. I would suggest replacing -x 5e8
with:
-M 20e9
to use 20 GB of memory (or more - use as much as you have).
This should result in many more reads being kept :)
There should have been a warning about "use more memory" at the bottom of the output of normalize-by-median - was there?
best, --titus
Okay Sir I will do as directed And get back to you But no warning message given However it gave the message that using 2 gb memory
You are really very helping Thank you so much
If I have to use more memory than 20 GB Then what should be the command Will it be still
-M 20e9
The -x
and -N
parameters used to be the only way to set memory. They are still an option, but the -M
parameter is much easier. It accepts human-readable suffixes, so something -M 20G
is valid for using 20 GB of memory, and if you want to increase to 36 GB you can use -M 36G
.
See the khmer docs for a more thorough discussion.
I read through the docs But why e in -M 20e9??
This is shorthand for 20 × 109. http://python-reference.readthedocs.io/en/latest/docs/float/scientific.html
Convenient notation so that you don't have to type out tons of 0
s, back when -x
and -N
was the only way to set memory usage.
If you don't specify a suffix like 200M
or 20G
, then by default the number represents the number of bytes you want to use.
Did this answer your question(s) @khuch123?