skyline Question about abillities

I intend to test Skyline to monitor anomalous behavior of cpu usage across several instances hosted on the company's DC.

I fully understand how to config skyline to find anomalous behavior of a server cpu vs its timeseries.

What I want to achieve is to find anomalous behavior of this instance vs the rest of the servers as a function of time. It will help us flag unwanted behavior of a distributed system that allocates more work (or harder computation) to that instance.

For example: the mean cpu for all servers is 30%. one instance is now at 40% for 1 hour. Whereas if this cpu behavior could be found anomalous against its timeseries - after 1 hour of that usage - it will be flagged as normal.
In my case, because the 40% is compared against 30% - it would definitely be flagged as anomalous for the entire cycle.

Feb 07 '18 15:02 asafcombo

That's an interesting use case - I am not sure Skyline supports that in particular, as all the timeseries are intended to be compared to themselves, not to other timeseries.

Perhaps as a hacky fix, you could make a composite timeseries, consisting of the average of all the machines? It's not exactly what you need but it could get you partially there.

On Wed, Feb 7, 2018 at 10:18 AM, asafcombo [email protected] wrote:

I intend to test Skyline to monitor anomalous behavior of cpu usage across several instances hosted on the company's DC.

I fully understand how to config skyline to find anomalous behavior of a server cpu vs its timeseries.

What I want to achieve is to find anomalous behavior of this instance vs the rest of the servers as a function of time. It will help us flag unwanted behavior of a distributed system that allocates more work (or harder computation) to that instance.

For example: the mean cpu for all servers is 30%. one instance is now at 40% for 1 hour. Whereas if this cpu behavior could be found anomalous against its timeseries - after 1 hour of that usage - it will be flagged as normal. In my case, because the 40% is compared against 30% - it would definitely be flagged as anomalous for the entire cycle.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etsy/skyline/issues/124, or mute the thread https://github.com/notifications/unsubscribe-auth/AARJpOlchCL9sVFozAvbJJLgRn5Qr3K7ks5tSb7GgaJpZM4R84Ix .

-- Abe Stanway abe.is

Feb 07 '18 15:02 astanway

On the other hand, you could also write a custom algorithm and hard code the other timeseries that you want to compare the current one too. That should do the trick.

On Wed, Feb 7, 2018 at 10:43 AM, Abe Stanway [email protected] wrote:

That's an interesting use case - I am not sure Skyline supports that in particular, as all the timeseries are intended to be compared to themselves, not to other timeseries.

Perhaps as a hacky fix, you could make a composite timeseries, consisting of the average of all the machines? It's not exactly what you need but it could get you partially there.

On Wed, Feb 7, 2018 at 10:18 AM, asafcombo [email protected] wrote:

I intend to test Skyline to monitor anomalous behavior of cpu usage across several instances hosted on the company's DC.

I fully understand how to config skyline to find anomalous behavior of a server cpu vs its timeseries.

What I want to achieve is to find anomalous behavior of this instance vs the rest of the servers as a function of time. It will help us flag unwanted behavior of a distributed system that allocates more work (or harder computation) to that instance.

For example: the mean cpu for all servers is 30%. one instance is now at 40% for 1 hour. Whereas if this cpu behavior could be found anomalous against its timeseries - after 1 hour of that usage - it will be flagged as normal. In my case, because the 40% is compared against 30% - it would definitely be flagged as anomalous for the entire cycle.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etsy/skyline/issues/124, or mute the thread https://github.com/notifications/unsubscribe-auth/AARJpOlchCL9sVFozAvbJJLgRn5Qr3K7ks5tSb7GgaJpZM4R84Ix .

-- Abe Stanway abe.is

-- Abe Stanway abe.is

Feb 07 '18 15:02 astanway

Hard-coding won't suffice as the servers ids could change by the Resource manager.

I think that what I could do is change analyzer.py function def spin_process

If I take several raw_series (one for each server) at the same time and work on them. But I would imagine that it will require some more work (I can't have this happen for each metric, because for example If I have 40 servers then I also have 40 metrics, and each time I check one metric I'll have to check the other 40, which is redundant).

Feb 07 '18 16:02 asafcombo

@asafcombo, @astanway is correct it could be done with some customisation and in terms of hard coding the server metric names, you could instead match on the namespace and then server id changes are not an issue, as long as you have a common namespace for servers, e.g. metrics.servers.<server_id>.*

In terms of the cost of checking the metrics, if done properly the penalty incurred should not be too steep, especially if it is only 40 metrics you are talking about compositing.

However, that said, even though all the servers may be the same, you may find that there metrics are somewhat different at times, normally. However as @astanway said this is an interesting use case I have been thinking about myself for some time in terms of metric clustering, however I feel it would work better if Skyline learnt related metrics and clustered the namespaces and did it all (or most) by itself :) Although I can tell you that it is probably not as easy as it sounds.

Skyline Mirage and Ionosphere may be able to help you out now in the interim, until you or I or someone else does that. I maintain an unforked version at https://github.com/earthgecko/skyline, the additional functionality is outlined here - https://github.com/etsy/skyline/issues/123#issuecomment-355222009

And I shall definitely be looking at adding something similar to what you have outlined here in the not too .... future. I am currently adding an autocorrelations module and with Skyline learning using autocorrelations and user defined correlations, the addition of another module to analyse the metric, in terms of clustered or composite metric medians, etc is one of the next logical steps, as you nicely outlined here. I am not certain how well it will work, but it will definitely work in some way :)

Good luck with your endeavours with Skyline.

Feb 07 '18 18:02 earthgecko

@earthgecko thanks. I will take a look to see if there is a quick workaround I can do.

Feb 08 '18 15:02 asafcombo