pcp
pcp copied to clipboard
{ instance.name == "foo" } incorrectly matches all instances
For example:
$ pmseries 'kernel.percpu.cpu.idle{instance.name == "cpu1"}[count:1]'
6c296277a0db2b778a4f89a2b706146d963f5052
[Mon Dec 7 16:38:27.087252000 2020] 901255970 cd0958474b0235b72efa7848dc9b5bd8174b155d
[Mon Dec 7 16:38:27.087252000 2020] 903396100 8dd4b5b5842bd9d7710bf10ef08f0556028282b5
[Mon Dec 7 16:38:27.087252000 2020] 903402000 1d3c0a4a4b91e4b0c72e7b26436737ca113fef84
[Mon Dec 7 16:38:27.087252000 2020] 903168350 108d9d807f3064eb6d385fbb6c3d69bbced1635e
[Mon Dec 7 16:38:27.087252000 2020] 907312680 3ca3ec7fe6e0a6e07a18881fce14691d7abb5ac9
[Mon Dec 7 16:38:27.087252000 2020] 907049000 b98db5db28774d35847841db6cce016d2ff5d4ca
[Mon Dec 7 16:38:27.087252000 2020] 906563660 472291797406cec9192e024d118cf1f94c6d5ccf
[Mon Dec 7 16:38:27.087252000 2020] 907306760 36b285b22101f5ca015ce6f5120f9bec2779f8f7
$ pmseries -l 6c296277a0db2b778a4f89a2b706146d963f5052
6c296277a0db2b778a4f89a2b706146d963f5052
inst [0 or "cpu0"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
inst [1 or "cpu1"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
inst [2 or "cpu2"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
inst [3 or "cpu3"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
inst [4 or "cpu4"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
inst [5 or "cpu5"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
inst [6 or "cpu6"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
inst [7 or "cpu7"] labels {"agent":"linux","device_type":"cpu","domainname":"localdomain","groupid":384,"hostname":"shack","indom_name":"per cpu","userid":386}
Nathan mentioned the fix probably requires an additional pass over the parse tree.
Note qa/1886 has some subtests that will need to be remade once this issue is fixed. e.g.
Verify min/max() functions for a non-sigular metric, only one instance
pmseries $args 'kernel.all.load{instance.name == "5 minute"}[count:5]'
which incorrectly selects all instances of the load average metric :
fecd5a4b4c6e1273eaa001287a6dd57b7bbd19f7
[Mon Oct 3 09:10:24.305845000 2011] 0.000000e+00 59181b1de54ff2b383cfd1cdd8636f86c880b69b
[Mon Oct 3 09:10:24.305845000 2011] 2.000000e-02 ab010c7d45145aa33c8f8fa681a68c9d4102ae19
[Mon Oct 3 09:10:24.305845000 2011] 5.000000e-02 9d418095c9f971ff4fd44d6828ead27f9d021dc3
[Mon Oct 3 09:10:23.802930000 2011] 0.000000e+00 59181b1de54ff2b383cfd1cdd8636f86c880b69b
[Mon Oct 3 09:10:23.802930000 2011] 2.000000e-02 ab010c7d45145aa33c8f8fa681a68c9d4102ae19
[Mon Oct 3 09:10:23.802930000 2011] 5.000000e-02 9d418095c9f971ff4fd44d6828ead27f9d021dc3
[Mon Oct 3 09:10:23.300460000 2011] 0.000000e+00 59181b1de54ff2b383cfd1cdd8636f86c880b69b
[Mon Oct 3 09:10:23.300460000 2011] 2.000000e-02 ab010c7d45145aa33c8f8fa681a68c9d4102ae19
[Mon Oct 3 09:10:23.300460000 2011] 5.000000e-02 9d418095c9f971ff4fd44d6828ead27f9d021dc3
[Mon Oct 3 09:10:22.959242000 2011] 0.000000e+00 59181b1de54ff2b383cfd1cdd8636f86c880b69b
[Mon Oct 3 09:10:22.959242000 2011] 2.000000e-02 ab010c7d45145aa33c8f8fa681a68c9d4102ae19
[Mon Oct 3 09:10:22.959242000 2011] 5.000000e-02 9d418095c9f971ff4fd44d6828ead27f9d021dc3
Nathan mentioned the fix probably requires an additional pass over the parse tree.
Is "the parser tree" about https://github.com/performancecopilot/pcp/blob/main/src/libpcp_web/src/query_parser.y#L205 ? Or something different things?
Yep, thereabouts - that's a pointer to the yacc code that implements the language parsing. It builds a tree in memory of the query expression. If you follow the logic below the pmSeriesQuery function, you'll see there's a point where it recurses over the parse tree dealing with each node and its children. This process needs to be changed to deal with instances for any metrics used, looking up instance-to-series identifiers and using them to refine the result.
I see. Thank you for the comment.