ncbi-vdb icon indicating copy to clipboard operation
ncbi-vdb copied to clipboard

Reduce maximum memory usage while reading SRA data?

Open jgans opened this issue 1 year ago • 2 comments

Using the C VDB API (and following the fasterq-dump utility strategy for accessing SRA records) for reading SRA data can consume a significant amount of RAM while reading an SRA record. This can be an issue when using attempting to minimize the amount of Cloud computing resources (i.e. instance RAM) when processing a large number of SRA records.

The maximum amount of RAM used while reading (as measured with /usr/bin/time -v) depends on the record: image

While periodically calling VCursorRelease() and VCursorOpen() to force the VDB interface to deallocate RAM offers a minor reduction in the maximum amount of RAM used (about 25%), this strategy significantly slows down the rate at which an SRA record is read.

Is it possible/feasible to limit memory consumption using the VDB C API to sub-gigabyte levels, independent of the number of reads? The goal is to read through an SRA record once, as quickly as possible and using as little RAM as possible.

jgans avatar Mar 28 '23 01:03 jgans