devlib Add schedstats instrument

Not sure how useful this will actually be hence RFC marker and not too much care or documentation. This basically jimmies /proc/schedstat into the Instrument API. What do you think?

Low priority at the moment, don't spend ages reviewing if you're busy :)

May 11 '17 15:05 bjackman

Latest version

Doesn't fail with the "eas" schedstats data but does spam WARNINGs. Could always just silently ignore "eas" data without a warning, until we add the ability to parse it.
Doesn't use the sysctl unless it's necessary, as the sysctl is v4.6+

May 23 '17 16:05 bjackman

@derkling @bjackman sorry for the delay in following up on this. Is this still needed? Is it ready for merge? If so, please could you rebase onto tip of master.

Sep 11 '17 12:09 setrofim

I think this could be really useful for use with WA3 but I would say it's low priority. It's also just occurred to me that I should think about how to get this to report diffs, i.e. you'd like to take a sample at the beginning and a sample at the end, and have the metric report the diff in schedstats during workload execution (maybe WA already does that?). Also would like some input from @derkling on the EAS stats thing. Are you OK with having this PR sit around a little bit longer?

Sep 11 '17 13:09 bjackman

maybe WA already does that?

WA has a generic "sysfs_extractor", which takes a list of sysfs locations, which it then pulls before and after each iteration, and then performs a generic diff; but it has no awareness of what it's diff'ing, so it doesn't generate any metrics (you just get a directory structure with the diffs).

Are you OK with having this PR sit around a little bit longer?

Yeah, no problem. There is no rush. It's just that there hasn't been any activity on this for a while, so I just wanted to make sure this is still relevant.

Sep 11 '17 13:09 setrofim

I've just refreshed this and plugged it into WA3 as an experiment.. definitely interesting but the only problem is it produces literally hundreds of metrics for each workload!

Oct 11 '17 13:10 bjackman

I've just refreshed this and plugged it into WA3 as an experiment.. definitely interesting but the only problem is it produces literally hundreds of metrics for each workload!

That's not necessarily an issue, if those metrics are actually individually useful. An alternative though, if they're more of a "state dump" for debug/deep-dive analysis, is to write them out to a file that is than added as an Artifact, rather than individually adding them as metrics in WA (or adding only a few key/summary ones as metrics).

Oct 11 '17 14:10 setrofim

Hmm yeah.. my dream for this Instrument is that you can immediately see that your kernel change e.g. "increased the average number of failed load balances" for a certain workload. All the added stats are potentially individually useful, but the particular one that's individually useful is probably going to change in every situation.

On the other hand I've just realised that's only going to happen if you explicitly ask for schedstats data, so maybe it's fine.

There are also a few potential aggregates I can think of (aggregate across CPU/sched_domain/idle type/any combination of those 3) but none of them makes any more sense than the others, so probably best considered a future feature. Subject to a bit more testing this is probably ready for merge.

Oct 11 '17 14:10 bjackman

That sounds good to me. There is nothing in WA which demands or even encourages reducing the number of metrics. We specifically use __slots__ for metrics to minimize the impact of having vast numbers of these objects. And, as you've say, this only matters if the user enables the instrument.

Ultimately, it comes down to what's easier to work with. As you'll likely be, at least initially, the only/the primary user of this instrument, this is for you to decide.

I should also point out that there is also the option of tagging the metrics with some kind of classifier to make it easier to filter/split them from the rest of the results.

Oct 11 '17 15:10 setrofim

OK sweet. Last time I tested this I saw some suspicious values, once I've investigated that, let's go ahead (this is still low priority though so might be a while).

Oct 12 '17 09:10 bjackman

OK I am now using this instrument in anger and getting sane results.

Dec 01 '17 16:12 bjackman

devlib devlib copied to clipboard

Add schedstats instrument

devlib
devlib copied to clipboard