gnocchi icon indicating copy to clipboard operation
gnocchi copied to clipboard

Group measures by resource metadata history

Open pedro-martins opened this issue 4 years ago • 11 comments

Problem description

When we are aggregating measures using Gnocchi, we can group the measures by resource's metadata. However, Gnocchi does not group the measures taking into account the resource's metadata history; so if some resource's metadata changed recently, it will be grouped using only the last metadata value.

Imagine the following situation:

I have a metric with metadata called "name", so I take a resource that is monitored by this metric and I change the resource's name from "test" to "test1" at 14:45:00. Then I call the Gnocchi's Aggregation API with an interval from 11 to 17 hours grouping by "name" and it returns this:

group name timestamp granularity value
name: test1 aggregated 2020-03-26T11:00:00+00:00 3600.0 101
name: test1 aggregated 2020-03-26T12:00:00+00:00 3600.0 104
name: test1 aggregated 2020-03-26T13:00:00+00:00 3600.0 98
name: test1 aggregated 2020-03-26T14:00:00+00:00 3600.0 100
name: test1 aggregated 2020-03-26T15:00:00+00:00 3600.0 97
name: test1 aggregated 2020-03-26T16:00:00+00:00 3600.0 109

Here we have a problem. For the system consuming Gnocchi API, it will look like the resource has the same attributes in all of the timestamps, when that is not true. This can have a negative impact on billing systems that rely on Gnocchi as the time series database (e.g. CloudKitty).

Proposal

I propose to allow users to specify if they want or not to group their measurements taking into account the resource's metadata history; if they want, they can just add a "use_history=true" parameter in their API request and Gnocchi will do it for them. If users do not specify the "use_history" in the API, the following property will be used instead.

[api]
group_measures_using_history = True/False

If the property or the query parameter are not defined, the "use_history" value assumes False (for backward compatibility). The new response expected with this proposal using the same situation listed in Problem description will be (using group_measures_using_history=true):

group name timestamp granularity value
name: test aggregated 2020-03-26T11:00:00+00:00 3600.0 101
name: test aggregated 2020-03-26T12:00:00+00:00 3600.0 104
name: test aggregated 2020-03-26T13:00:00+00:00 3600.0 98
name: test aggregated 2020-03-26T14:00:00+00:00 3600.0 75
name: test1 aggregated 2020-03-26T14:00:00+00:00 3600.0 25
name: test1 aggregated 2020-03-26T15:00:00+00:00 3600.0 97
name: test1 aggregated 2020-03-26T16:00:00+00:00 3600.0 109

As you can see, there are 2 entries at 14:00; it happens because the resource's name was changed at 14:45, that is equivalent to 75% of the granularity window (that is 1 hour), so the measure named "test" will take 75% of the total measurement of the granularity window, and the measure named "test1" will take the remaining 25% of the measurement value.

pedro-martins avatar Mar 26 '20 23:03 pedro-martins

Hi and thank you guys! Right now the number one issue is https://github.com/gnocchixyz/gnocchi/issues/1049 as you might have seen.

I don't see the CI running on this PR so I feel like it really does not work anymore or has been disabled?

I feel like there's really some work to be done here before we can merge anything.

This PR seems quite large so it might require some time to review, especially since nobody's usually around.

jd avatar Apr 01 '20 06:04 jd

quite a large change, can you please recheck/reopen to get a new CI check and we can try getting this back on the road.

tobias-urdin avatar Sep 15 '20 16:09 tobias-urdin

@mergify rebase

chungg avatar Oct 02 '20 18:10 chungg

Command rebase: success

Branch has been successfully rebased

Hey, I reacted but my real name is @Mergifyio

mergify[bot] avatar Oct 02 '20 18:10 mergify[bot]

Do you guys know any problem, with lib pyparsing? seems to be breaking operations that contains the in operator like in test self._do_test('foo in ["null", "foo"]', {"in": {"foo": ["null", "foo"]}})

PS: the errors that happen in CI don't happen locally.

pedro-martins avatar Jan 21 '22 15:01 pedro-martins

Possible that something changed with the new pyparsing version just released https://github.com/pyparsing/pyparsing/releases - I don't have the time investigate this straight away though, will check later.

tobias-urdin avatar Jan 21 '22 16:01 tobias-urdin

Can this conflict be resolved and then we can make have a another look at this and merge it? Thanks!

tobias-urdin avatar Mar 13 '22 20:03 tobias-urdin

pyparsing changed the output so this needs a change like was done in https://github.com/gnocchixyz/gnocchi/pull/1185/commits/70da17a90a13151a307c51bc26bf6994c1ff222a

tobias-urdin avatar Mar 14 '22 19:03 tobias-urdin

Thanks tobias, fix applied.

pedro-martins avatar Mar 14 '22 22:03 pedro-martins

Please rebase against master as well to get all the changes for the CI to work as well.

tobias-urdin avatar Mar 28 '22 13:03 tobias-urdin

@jd will merge this in the next couple of days if you don't have any feedback

tobias-urdin avatar Mar 28 '22 20:03 tobias-urdin

if i'm honest, i'm not entirely sure this is the ideal implementation or behaviour but i'm also not as involved anymore and can see it's a valid use case so am not against it. if others are ok with it, i would suggest docs and some api tests.

I added some gabbit tests and Rest API docs regarding the changes.

pedro-martins avatar Sep 29 '22 17:09 pedro-martins