khmer icon indicating copy to clipboard operation
khmer copied to clipboard

khmer

Open khuch123 opened this issue 7 years ago • 13 comments

I got to do digital normalization with metagenome file of size 71.3 Gb, and I got file of 47.7 Gb after that, is it possible?

Shall I post my commands to let you know better and any help in this matter

khuch123 avatar Feb 23 '18 14:02 khuch123

Not 47.7 Gb sorry typing error but 4.7 Gb

khuch123 avatar Feb 23 '18 14:02 khuch123

hi @khuch123 this would be reasonable for RNAseq or a not-very-diverse metagenome or a super coverage genome, certainly. If you could post the command (in particular the -M parameter) that would help us take a look. What are you sequencing?

ctb avatar Feb 23 '18 14:02 ctb

./normalize-by-median.py -k 20 -C 20 -N 4 -x 5e8 -p --savegraph normC20k20.kh final.fq

khuch123 avatar Feb 23 '18 14:02 khuch123

I tried the -M parameter but it was not working with -x command and I happen to see this command from the protocols of Kalamazoo metagenome assembly protocols

khuch123 avatar Feb 23 '18 15:02 khuch123

Ahh, yes. OK, I think the main problem I see is that you are using a very low amount of memory. I would suggest replacing -x 5e8 with:

-M 20e9

to use 20 GB of memory (or more - use as much as you have).

This should result in many more reads being kept :)

There should have been a warning about "use more memory" at the bottom of the output of normalize-by-median - was there?

best, --titus

ctb avatar Feb 23 '18 15:02 ctb

Okay Sir I will do as directed And get back to you But no warning message given However it gave the message that using 2 gb memory

You are really very helping Thank you so much

khuch123 avatar Feb 23 '18 17:02 khuch123

If I have to use more memory than 20 GB Then what should be the command Will it be still

-M 20e9

khuch123 avatar Feb 23 '18 17:02 khuch123

The -x and -N parameters used to be the only way to set memory. They are still an option, but the -M parameter is much easier. It accepts human-readable suffixes, so something -M 20G is valid for using 20 GB of memory, and if you want to increase to 36 GB you can use -M 36G.

standage avatar Feb 23 '18 17:02 standage

See the khmer docs for a more thorough discussion.

standage avatar Feb 23 '18 17:02 standage

I read through the docs But why e in -M 20e9??

khuch123 avatar Feb 23 '18 17:02 khuch123

This is shorthand for 20 × 109. http://python-reference.readthedocs.io/en/latest/docs/float/scientific.html

Convenient notation so that you don't have to type out tons of 0s, back when -x and -N was the only way to set memory usage.

standage avatar Feb 23 '18 17:02 standage

If you don't specify a suffix like 200M or 20G, then by default the number represents the number of bytes you want to use.

standage avatar Feb 23 '18 17:02 standage

Did this answer your question(s) @khuch123?

standage avatar Apr 02 '18 20:04 standage