ncbi-vdb
ncbi-vdb copied to clipboard
Reduce maximum memory usage while reading SRA data?
Using the C VDB API (and following the fasterq-dump
utility strategy for accessing SRA records) for reading SRA data can consume a significant amount of RAM while reading an SRA record. This can be an issue when using attempting to minimize the amount of Cloud computing resources (i.e. instance RAM) when processing a large number of SRA records.
The maximum amount of RAM used while reading (as measured with /usr/bin/time -v
) depends on the record:
While periodically calling VCursorRelease()
and VCursorOpen()
to force the VDB interface to deallocate RAM offers a minor reduction in the maximum amount of RAM used (about 25%), this strategy significantly slows down the rate at which an SRA record is read.
Is it possible/feasible to limit memory consumption using the VDB C API to sub-gigabyte levels, independent of the number of reads? The goal is to read through an SRA record once, as quickly as possible and using as little RAM as possible.