ovis
ovis copied to clipboard
store split keys
The CSV and other container oriented stores, for data sets that represent the same metrics for many entities produce unwieldy containers (very large single files). The storage policy architecture routes data by schema only; there is no way to filter by instance name.
One way to alleviate the forced coarseness of data storage and analytics, which at large scale is troublesome, is to let the (for instance) CSV user define a list of metrics that are the keys on which file and subdirectory splits are done. For example, a schema containing a device name (metric such as port in opa2) could at the CSV store produce a file per device name if configured with keys=port.
Similarly, for export to users by job, we could apply a CSV with keys=user,job_id such that output ends up in $path/$user/$job_id/meminfo
Similarly, scalability of containers to weekly system data volumes for sos could be arranged as key=ProducerName, which would provoke per-producer containers instead of a single 1800x larger container including data from 1800 producers. This would let us look at a single node for a much larger time window; we have jobs that run 2 weeks to infinity at SNL, so this would be a substantial win in the analytics pipeline.