metrictank groupByTags tag parsing adds bad tags

Describe the bug

groupByTags builds a buffer of ; delimited tags to form the key and then groups all series with the same key together. To get the individual tag values back, it calls SetTags() on the result series.

Given input from a "complex" pipeline like:

groupByTags(
  divideSeriesLists(
    sortByName(
      seriesByTag('namespace=os', 'cluster=clusterName', 'name=cpu.percent.user'),
    false), 
    removeBelowValue(
      sortByName(
        seriesByTag('namespace=os', 'cluster=clusterName', 'name=cpu.percent.idle'),
      false),
    10)
  ),
'sum', 'name')

and let's say the input series just has the cluster tag. After going through SetTags() we get tags like: name=divideSeries(cpu.percent.user.g cluster=clusterName namespace=os, 10))

groupByTags should have just produced the name tag but because it calls SetTags() and the target after divideSeriesLists had values like divideSeries(cpu.percent.user;cluster=clusterName;namespace=os,removeBelowValue(cpu.percent.idle;cluster=clusterName;namespace=os, 10)) it just naively splits on ; and adds some bogus tags.

Oct 07 '20 13:10 shanson7

Perhaps groupByTags should be "smarter" in building the key to prevent this issue instead of relying on string encoding/parsing?

Oct 07 '20 13:10 shanson7

I believe that the following query, a little simpler perhaps, exhibits the same issue:

groupByTags(divideSeriesLists(seriesByTag('name=random1'), seriesByTag('name=random2')), 'sum', 'name')

Nov 14 '20 00:11 fkaleo

@shanson7 do you feel the following 2 tests adequately cover the problem (if not, can you clarify):

func TestGroupByTagsSingleSeriesWithFuncs(t *testing.T) {
        in := []models.Series{
                getModel("foo(name1;t1=v1,bar(name1;t1=v1, 123))", a),
                getModel("foo(name1;t1=v1,bar(name1;t1=v1, 123))", a),
        }
        out := []models.Series{
                getModel("name1;t1=v1", sumab),
        }

        out[0].Datapoints = out[0].Datapoints[:0]
        aggFunc := getCrossSeriesAggFunc("sum")
        aggFunc(in, &out[0].Datapoints)

        testGroupByTags("SingleSeriesSum", in, out, "sum", []string{"t1"}, nil, t)
}
func TestGroupByTagsSingleSeriesWithFuncs2Tags(t *testing.T) {
        in := []models.Series{
                getModel("foo(name1;t1=v1;t2=v2,bar(name1;t1=v1;t2=v2, 123))", a),
                getModel("foo(name1;t1=v1;t2=v2,bar(name1;t1=v1;t2=v2, 123))", a),
        }
        out := []models.Series{
                getModel("name1;t1=v1;t2=v2", sumab),
        }

        out[0].Datapoints = out[0].Datapoints[:0]
        aggFunc := getCrossSeriesAggFunc("sum")
        aggFunc(in, &out[0].Datapoints)

        testGroupByTags("SingleSeriesSum", in, out, "sum", []string{"t1"}, nil, t)
}

I think we're going about this the wrong way.

What's not clear to me is how to generalize this. if any series target can be any arbitrary mixture of function calls and various series (each with possibly different sets of tags, both in terms of the tag names as well as values), then what are the tags of said series? @fkaleo's PR mentions " SetTags should consider as tags only the tag expressions (ie. ;tag=value) that are found after the end of the last function of the target.". Why those particular tags? this feels completely arbitrary. I don't think we can really assume how this should be reconciled ?

Furthermore, I think we need to be mindful that groupByTag not only needs a proper set of tags, but also a proper name. I think a "name" like foo(name1;t1=v1,bar(name1;t1=v1, 123)) (which is what serie.Tags["name"] is in one of the tests above) is not useful, at least not if we're going to cleanly reconcile the tags already (see below), then we can do the same for the name.

If by the time groupByTags runs, we then need to do string parsing to figure out what the tags are of the input (and name), that feels wrong. rather, we should probably have each intermediate step somehow set tags and names appropriately, such that groupByTags doesn't have to call SetTags to begin with.

So perhaps we need to figure out/implement https://github.com/graphite-project/graphite-web/issues/2639 first.

Jan 18 '21 14:01 Dieterbe

@shanson7 do you feel the following 2 tests adequately cover the problem

Maybe? I'd like to see those tests using the name tag and a case using another tag as well.

if any series target can be any arbitrary mixture of function calls and various series (each with possibly different sets of tags, both in terms of the tag names as well as values), then what are the tags of said series?

The tags of the series are well known, they are in the Tags member. Assuming we already have tags, we shouldn't need to parse tags out of anything. AFAIK, the only issue is with the name tag, which can be made complex with function calls and combinations. Other tags either exist or don't (AFAIK, there are no other tag manipulation functions like aliasTag).

So, I'm not sure that I would consider this a bug in SetTags (which seems to be what #1948 seems to be treating it as) but rather improper usage of SetTags.

Jan 18 '21 15:01 shanson7

metrictank metrictank copied to clipboard

groupByTags tag parsing adds bad tags

metrictank
metrictank copied to clipboard