k6
k6 copied to clipboard
Refactor metric samples to use Atlas for tags and have time series
Integrated https://github.com/mstoykov/atlas as the internal data structure for SampleTags
. Atlas is a Btree implementation based on [2]string
array for the nodes (the stored tags pairs). It uses pointers most of the time for comparisons so it's very fast and read locks can be avoided.
It allows allocating a specific Tag pair just once. The reduction in terms of memory from my basic benchmarks is very important. For example, the benchmark on a similar function reduces the overall allocated memory from 600MB to 87MB.
Closes #1831
Hmm I'd definitely want to see more benchmarks and tests before we merge something like this :thinking: Considering we'll have just a single copy of the tags per TimeSeries
, I don't think we need to replace a slice with such a more complicated struct, the memory savings won't be as impressive as what you claim. It can probably reduce the allocations if you have gradual construction of the final tag set, at the cost of lock contention, but not as it's currently used in this PR (since you already have constructed the map you pass to NewSampleTags()
).
Besides, I have just skimmed the atlas code, but I am not sure it works like how we'd need it to work. A tag set with keys {a: b, c: d}
should be equivalent to a tag set with the same keys in a different order, {c: d, a: b}
, which I don't think is the case now.
Hmm I'd definitely want to see more benchmarks and tests before we merge something like this
We are definittely not just merging this. I would even expect that we would rewrite most of the code to actually not have a global root, but have it in the metrics.Registry before we even think of merging.
I would also expect at least some of the code to actually use the root node directly (the http path definitely) instead of NewSampleTags
as that will also be a lot more performant then creating a map and then making it into an atlas node.
Considering we'll have just a single copy of the tags per TimeSeries
How do we build and have only one copy and keep it only one copy? From the code that I have seen so far - we always build a new slice of tags and then we hash it and go get it from the registry. But we still made a new copy - we just threw it away in the mean time.
Besides, I have just skimmed the atlas code, but I am not sure it works like how we'd need it to work. A tag set with keys {a: b, c: d} should be equivalent to a tag set with the same keys in a different order, {c: d, a: b}, which I don't think is the case now.
It should, but you can test it and open an issue if it doesn't ;)
To be honest I would expect the reduction in memory to be negligable for most real uses case, but the reduction in GC time in those same cases should be significant once we are using this directly building from lib.State#Tags instead of going from the root node up for each set.
One of the points of this structure is that it "memorizes" how it was constructed and optimizes the same route for the next time. So as long as the code that generates it takes the same route it will be faster the following times. But building it from a map has more chances of it being randomized.
I still would like to see if even like that it makes a difference and in which direction if you just have a fairly basic test. I would expect thsi will be slightly ... worse, before we stop generating the whole map and instead use the atlas.Node for that as well
@na-- @mstoykov @oleiade I pushed a new iteration with a basic Atlas integration. Please, take a look and let me know your opinions. I expect several points of discussion so I didn't complete the entire migration, in any case, the js/http
package should work as expected.
Updates:
- Restored
lib.State.Tags
and renamedmetrics.TagMap
tometrics.TagSet
-
metrics.TagSet
is now always lock free -
lib.RunTags
from SampleTags tomap[string]string
- Root set in PreInitState
- Resolved RunTags set in
TestState
Note: as you can see from the failing xk6 test, we are breaking the extensions removing the ability to create a SampleTags
from a map.
@codebien, @mstoykov and I had a call to discuss some of the important details in this PR and here's what we decided:
- We won't have a
metrics.TagSet
with an internally mutatingatlas.Node
, instead we'll have an immutableTagSet
that is redefiningatlas.Node
like this:type TagSet atlas.Node func (tg *TagSet) Add(name, value string) *TagSet { return (*TagSet)(((*atlas.Node)(tg)).AddLink(name, value)) } func (tg *TagSet) Delete(name string) *TagSet { return (*TagSet)(tg.Delete(name)) } // ... + wrappers for other atlas.Node methods that return a *Node or have unsuitable names // ... + JSON methods from SampleTags
- We'll completely get rid of
metrics.SampleTags
, the JSON marshaling/unmarshaling methods will be added tometrics.TagSet
and themetrics.Sample
struct will look somewhat like this (after non-indexable tags):type Sample struct { Metric *Metric Time time.Time Tags *TagSet Metadata map[string]string // could be nil Value float64 }
- Naming things in the
metrics.Registry
will be simpler because they are now immutable, e.g.:type Registry struct { metrics map[string]*Metric l sync.RWMutex rootTagSet *atlas.Node } func NewRegistry() *Registry { return &Registry{ metrics: make(map[string]*Metric), // All the new TagSts must branch out from this root, otherwise // comparing them and using their Equals() method won't work correctly. rootTagSet: atlas.New(), } } // ... func (r *Registry) RootTagSet() *TagSet { return (*TagSet)(r.rootTagSet) }
- If we have a lot of tags being created somewhere, e.g. when emitting the HTTP metrics, we can easily create accumulator-like helpers to help with that task. Whether we export them or make them global will be decided on a case-by-case basis. For example, there is probably some benefit of having a helper with the system tags in the above situation:
tags := NewSystemTagAccumulator(t.tags, t.state.Options.SystemTags) // ... tags.AddSystemTag(metrics.TagMethod, unfReq.request.Method) // ... tags.AddSystemTag(metrics.TagStatus, unfReq.response.StatusCode) // ...
-
SortAndAddTags()
is another such helper, it is still needed and can either be a TagSet method or a standalone function.
@mstoykov, @codebien, am I missing anything?
As an addendum to the previous note, we realized that we can make a tiny change to metrics.Sample
and group the Metric
and Tags
fields into their own simple TimeSeries
struct and use that in Sample
, like this:
type TimeSeries struct {
Metric *Metric
Tags *TagSet
}
type Sample struct {
TimeSeries
Time time.Time
Metadata map[string]string // could be nil
Value float64
}
This doesn't change anything significant for Sample
and the TimeSeries
type is a plain naked struct without any methods. However, it will allow us to go a long way towards addressing https://github.com/grafana/k6/issues/2580. Because of the way comparison works in go, you'd be able to use ==
to compare the TimeSeries
values of 2 metric Samples, or use the them as map indexes. From https://go.dev/ref/spec#Comparison_operators
Pointer values are comparable. Two pointer values are equal if they point to the same variable or if both have value nil. Pointers to distinct zero-size variables may or may not be equal. ... Struct values are comparable if all their fields are comparable. Two struct values are equal if their corresponding non-blank fields are equal.
We might still have a central registry for TimeSeries in the future (e.g. to be able to count exactly how many unique ones we have), but if we do this small change now, it's likely we won't need to make another breaking change to metrics.Sample
in the future (:crossed_fingers: :sweat_smile:).
FYI we almost inadvertently introduced a VU activation bug with the migration to atlas. Changes to vu.tags
(from k6/execution
) shouldn't persist between scenarios, and they don't in master
, we get our activation base tags from the main test's runTags
: https://github.com/grafana/k6/blob/5b70e70b657750234651a337c1b736c177926855/js/runner.go#L613
but we changed it to get it from the VU.State
: https://github.com/grafana/k6/blob/6fbde0ab9a35dd6520b83acbfd482477438e2147/js/runner.go#L616
and I perpetuated the problem in https://github.com/grafana/k6/pull/2638 with https://github.com/grafana/k6/blob/e1a641a1aa0c82d1d2f8db77611ccf4ba9bf301b/js/runner.go#L638-L639
Noticed this when I was doing the unindexable tags and I've fixed it and added a test and an explicit comment, see the latest force-pushes (since I chose to fix the older commit that perpetuated it): https://github.com/grafana/k6/blob/1c8b7e9f758fcaecf6cb60a2e6938fcfd07c34b8/js/runner.go#L618-L621
I added also my opinion regarding With/Without naming here https://github.com/grafana/k6/pull/2594#discussion_r947724265
@olegbespalov I added your suggestions
@codebien there is a merge conflict
At this point, I consider squashing to merge so I just merged the current master branch here to avoid any possibility to break in some way this branch rebasing.
As a reminder, otherwise, the xk6-websockets
extension will be broken:
- from now rebasing is denied for this PR
- for merging this PR use the classic Merge feature for keep the commit history