datamon Change bundle upload and download logic to be less memory hungry

Change bundle upload and download logic to be less memory hungry

Open kerneltime opened this issue 5 years ago • 3 comments

Currently the CAFS writer has to keep a record in the channel for each chunk that is uploaded until the flush is called. This leads to channel sizes that are large enough to handle the larges possible file. This needs to change to where the channels are being drained in parallel.

May 22 '19 16:05 kerneltime

This will also eliminate the max CAFS file size limitation for uploads. Not applicable to downloads.

May 22 '19 20:05 kerneltime

i'm grouping the initial metric-collection setup with this task, since the plan is to collect metrics specific to upload of large files.

my plan here to to create a separate executable that does the metric collection.

Jun 04 '19 22:06 ransomw1c

slight scope shift here as the plan is to use a --concurrency-factor flag for both bundle upload and download that can be set between 1 and the upper bound on ints with sensible defaults.

while the --concurrency-factor might initially map directly onto some particular value (such as number of concurrent uploads or downloads), the external view need only have some parameter to tune concurrency (and hence memory usage), nothing as specific as number of files uploaded or downloaded.

[ ] Extend to read/download path
[ ] Rename to concurrency/concurrency-factor? open to suggestions.

Jun 19 '19 21:06 ransomw1c

datamon datamon copied to clipboard

Change bundle upload and download logic to be less memory hungry

datamon
datamon copied to clipboard