datamon upload instrumentation

at One Concern, in addition to using the sidecar within Argo workflows, we distribute datamon to desktop with brew.

frequently, data-scientists need to "ingest," we say, data into the Argo workflows comprising the flood, for instance, simulation pipeline(s) without running a pre-packaged ingestor workflow. sometimes there's a 500 error or bundle upload or bundle mount new fail for one reason or another. this task proposes to begin to address the pain-point already solved in part by the fact that duplicate blobs (2k chunks) aren't uploaded twice.

specifically, the idea is to instrument (via golang in the binary, shell-script as in the sidecar, or Python, bindings for which exist in #393 , not having been merged only because of documentation requirements) the paths from desktop to cloud (bundle upload, bundle mount new, etc) to provide

metrics and usage statistics to improve datamon
progress indicators, logging, and a smoother experience for data-science
any and all additional tracing, timeing, and output formatting to ease backpressure on this known iss

Apr 16 '20 05:04 ransomw1c

this'd be a great starter issue because it's not cloud-specific (minor changes would allow fork that syncs your, the user/programmer's, local disk with arbitrary filesystem-like things) and the provided patch is mostly out-of-band/orthogonal/... to the rest of the datamon implementation.

Apr 16 '20 05:04 ransomw1c

i should also mention that there is an alternate approach to the same essential use-case of adding additional data sources from desktop in #413 . the idea in that proposal, again, is to allow arbitrary first-miles into the cluster, then allow the web scheduler to fully digest the data into datamon, dry style.

mr. rod serling

Apr 16 '20 06:04 ransomw1c

datamon datamon copied to clipboard

upload instrumentation

datamon
datamon copied to clipboard