datamon icon indicating copy to clipboard operation
datamon copied to clipboard

upload instrumentation

Open ransomw1c opened this issue 4 years ago • 2 comments

at One Concern, in addition to using the sidecar within Argo workflows, we distribute datamon to desktop with brew.

frequently, data-scientists need to "ingest," we say, data into the Argo workflows comprising the flood, for instance, simulation pipeline(s) without running a pre-packaged ingestor workflow. sometimes there's a 500 error or bundle upload or bundle mount new fail for one reason or another. this task proposes to begin to address the pain-point already solved in part by the fact that duplicate blobs (2k chunks) aren't uploaded twice.

specifically, the idea is to instrument (via golang in the binary, shell-script as in the sidecar, or Python, bindings for which exist in #393 , not having been merged only because of documentation requirements) the paths from desktop to cloud (bundle upload, bundle mount new, etc) to provide

  • metrics and usage statistics to improve datamon
  • progress indicators, logging, and a smoother experience for data-science
  • any and all additional tracing, timeing, and output formatting to ease backpressure on this known iss

ransomw1c avatar Apr 16 '20 05:04 ransomw1c

this'd be a great starter issue because it's not cloud-specific (minor changes would allow fork that syncs your, the user/programmer's, local disk with arbitrary filesystem-like things) and the provided patch is mostly out-of-band/orthogonal/... to the rest of the datamon implementation.

ransomw1c avatar Apr 16 '20 05:04 ransomw1c

i should also mention that there is an alternate approach to the same essential use-case of adding additional data sources from desktop in #413 . the idea in that proposal, again, is to allow arbitrary first-miles into the cluster, then allow the web scheduler to fully digest the data into datamon, dry style.

mr. rod serling

ransomw1c avatar Apr 16 '20 06:04 ransomw1c