k6 icon indicating copy to clipboard operation
k6 copied to clipboard

Time Series data model

Open codebien opened this issue 3 years ago • 5 comments

We want to introduce the time series concept to k6 for getting the benefit of efficiently identifying the combination between a metric and the relative tags.

A summary of the benefits it brings across the platform:

  • It allows executing faster lookups and comparisons by a single value, the ID (hash) of the series.
  • It reduces the storage by allocating just one time a possible combination (the time series) and referencing it in all other places.
  • It allows aggregating data by time series.
  • It enables easier detection of high cardinality issues just by counting the entire unique data set of generated time series.
  • It enables time series based outputs to simplify the integration's logic (e.g. Prometheus).

Data model

Metrics and Series

  • Metric (name, type)
  • Time Series (metric, tags)
  • Sample (timestamp, value, time series ref)
  • Tag - (aka k6/1831)
// This is the current k6 Metric polished from the Thresholds dependencies.
type Metric struct {
	name     string
	type     MetricType
	contains ValueType
}

type TagSet atlas.Node

type TimeSeries struct {
        Metric *TimeSeries
        Tags *TagSet
}

type Sample struct {
        TimeSeries *TimeSeries
        Timestamp uint64
        Value float64
}

Sink

The sinks are implemented by metric types and they keep values up to date by time series and/or aggregated views:

  • Counter: a monotonic increasing value
  • Gauge: the latest value
  • Rate: the percentage of non-zero values across the total of the events
  • Trend as SparseHistogram: counters per dynamic ranges (buckets) - https://grafana.com/blog/2021/11/03/how-sparse-histograms-can-improve-efficiency-precision-and-mergeability-in-prometheus-tsdb

Storage

It will store the root of Atlas in the metrics.Registry and it will provide a method for get it for branching out a new TagSet.

type Registry struct {
        ...
	tagSetRoot *atlas.Node
}

func (r *Registry) RootTagSet() *TagSet {
	return (*TagSet)(r.tagSetRoot)
}

Samples generation

The current sampling process is controlled by the metrics.PushIfNotDone method. All the actual callers should resolve the time series from the storage before pushing a new Sample, or in the event, no one is found the storage will create and insert a new one. It requires the dependency from the time series database for all the callers (e.g. executors, JavaScript modules). Most of them already have the metrics.Registry dependency.

metrics.Ingester

Ingester is responsible for resolving the entire set of Sinks impacted from the ingested time series then it adds the Sample's value to the resolved Sinks.

Example

In a t0 where the status of the seen time series is:

http_req_duration{status:200,method:GET}
http_req_duration{method:GET}
http_req_duration{status:200}
http_req_another_one{status:200}

The Ingester in the case of a collected Sample http_req_duration{status:200,method:GET} then it resolves the dependencies with the other seen time series in a unique set where it contains the following time series http_req_duration{status:200} and http_req_duration{method:GET}. It can now resolve the relative Metric's Sinks and it invokes them passing the Sample's value.

Known issues

  • Name and URL tags: k6 is tagging HTTP requests with the URL. It will create a high cardinality issue for the time series data model. This should be fixed by adding the possibility to not store all the tags as indexable, having the availability to set them as metadata and exclude them from the time series generation. An alternative workaround could be to exclude them from the default set of enabled tags.
  • We need to keep all the data for the Trend type for computing the percentiles. We plan to migrate to some form of approximation (Sparse Histogram, OpenHistogram, T-Digest, etc..). https://github.com/grafana/k6/issues/763

codebien avatar Jun 29 '22 09:06 codebien

The Sample struct will probably have an extra field for the metadata / non-indexed tags needed for https://github.com/grafana/k6/issues/2128 (among other things). That shouldn't affect anything else though, these tags will just be ignored when building the TimeSeries.

na-- avatar Jun 29 '22 10:06 na--

It's relevant to mention that we are not going probably to implement the entire refactoring in a single iteration. The first expected work should be:

  • Add the TimeSeries struct as described in the description.
  • Add the TimeSeries pointer to the Sample struct, getting the following result:
// A Sample is a single measurement.
type Sample struct {
	Metric *Metric
	Time   time.Time
	Tags   *SampleTags
	Value  float64
        TimeSeries *TimeSeries
}
  • Populate the new field invoking registry.GetOrCreateSeries before invoking PushIfNotDone from all the parts where the samples are generated.
  • ~~Migrate the metrics.Ingester's logics to use the TimeSeries.~~ (EDIT: this point can be moved in the next iterations as well).
  • Announce the future breaking change to the metrics API in a warning or mention in the documentation and recommend migrating to Sample.TimeSeries field. This will be very important for output extensions.

It enables us to start fixing issues blocked by this specific concept but also to not break change with a bad UX.

Only in the next iterations, the redundant fields from the Sample struct will be removed and the Metric struct will be refactored as described in the description.

codebien avatar Jun 29 '22 16:06 codebien

I have some questions(no particular order):

k6 is tagging HTTP requests with the URL. It will create a high cardinality issue

I would expect this will be addressed first?

We need to store all the time series generated from k6 run during the executio

Does that mean that we will forever hold all the tags + the name for each timeseries or just the hash?

Tag - (aka https://github.com/grafana/k6/issues/1831)

is there are particular thing from that issue that is intended to be implemented? It doesn't seem clear and having them as []Tag seems like it won't lend it self to actually at least some of the propositions there. IIRC a lot of the propositions there are basically to have the whole []Tag be one struct that we can construct effeciently and then we come up with different definitions of "efficiently" and ways to accomplish them.

After I started looking into this last week and wrote some thoughts down, I did spend some time over the weekend to make https://github.com/mstoykov/atlas (mostly on the benchmarks actually :roll_eyes:).

I kind of wonder if something similar can not replace the SampleTags internal representation and then to over time change the code to make a better use of it. This likely will be better of tried really fast to see if it doesn't totally destroy the performance of k6, before we spend time to make it not breaking as possible and then for it to turn out to not work great.

mstoykov avatar Jul 04 '22 08:07 mstoykov

I would expect this will be addressed first?

We have parallel discussions and I expect we will have parallel implementations. We can set it as an acceptance criteria for merging but I wouldn't set it as a requirement for starting the development. In the eventual case, that we can't achieve the final design for fixing the url tag before we are ready to merge, I think we can work around discarding the url tag from the TimeSeries generation if name and url are set with different values.

Does that mean that we will forever hold all the tags + the name for each timeseries or just the hash?

We have to keep all the tags in some place otherwise we can't map them from the Outputs. So, it should be convenient to make the TimeSeries the central place where they live. (Note: in this way, the proposal for the first step makes them duplicated between the TimeSeries and the Sample structs).

is there are particular thing from that issue that is intended to be implemented?

Yes, the internal tags field and the (TimeSeries) GetTags() []Tag method. But only if we have a consensus for the final design of Tag. I will back on this after an evaluation of your proposal.

codebien avatar Jul 04 '22 10:07 codebien

The design evolved in a better direction, the summary of the current design can be found at https://github.com/grafana/k6/pull/2594#issuecomment-1205273413 and https://github.com/grafana/k6/pull/2594#issuecomment-1205359198. The issue's description has been updated also.

codebien avatar Aug 04 '22 14:08 codebien

The main part has been implemented and it is now available in master. I keep this issue open until I haven't split the remaining parts from this issue into smaller and dedicated issues.

codebien avatar Oct 04 '22 14:10 codebien

I opened https://github.com/grafana/k6/issues/2735 which is the missing part from this issue. HDR histogram and metric's structs refactor have already well-known open issues.

codebien avatar Oct 17 '22 15:10 codebien