datamon
datamon copied to clipboard
Change bundle upload and download logic to be less memory hungry
Currently the CAFS writer has to keep a record in the channel for each chunk that is uploaded until the flush is called. This leads to channel sizes that are large enough to handle the larges possible file. This needs to change to where the channels are being drained in parallel.
This will also eliminate the max CAFS file size limitation for uploads. Not applicable to downloads.
i'm grouping the initial metric-collection setup with this task, since the plan is to collect metrics specific to upload of large files.
my plan here to to create a separate executable that does the metric collection.
slight scope shift here as the plan is to use a --concurrency-factor
flag for both bundle upload
and download
that can be set between 1
and the upper bound on int
s with sensible defaults.
while the --concurrency-factor
might initially map directly onto some particular value (such as number of concurrent uploads or downloads), the external view need only have some parameter to tune concurrency (and hence memory usage), nothing as specific as number of files uploaded or downloaded.
-
[ ] Extend to read/download path
-
[ ] Rename to concurrency/concurrency-factor? open to suggestions.