ovis icon indicating copy to clipboard operation
ovis copied to clipboard

ldmsd and ldms-aggd need real first-class configuration files

Open morrone opened this issue 5 years ago • 44 comments

I think that ldmsd and ldms-aggd really both need first-class configuration files.

Unless I am mistaken, there is no single unified configuration file for the ldmsd. Some things are configured through the command line options and/or environment variables. For those things, there appears to be no way to put them in a configuration file.

There is a "configuration file" for plugin configuration, but that is really more a script of ldms commands rather than a configuration file.

Here are some of the things that I would expect a first-class configuration file for ldmsd to have (ldms-aggd would be similar):

  • A top-level section devoted to general ldmsd daemon options, such as:
    • networking options
    • logging options
    • thread options
    • paths to relevant files and plugins
  • Sub sections for each plugin

Note that plugin configuration should be configuration, not a list of commands. In particular there should be no "load" or "start" commands in the configuration file. Those are implementation details. The daemon will know to load and start the plugin based on the fact that the plugin is configured in the configuration file.

morrone avatar Oct 22 '19 17:10 morrone

Also, the daemons should look for the configuration file in a known, standard system location by default. This can be overridden by specifying a different configuration file on the command line.

morrone avatar Oct 22 '19 17:10 morrone

There is a practical solution to this which is already available-- use the genders-based systemd ldms configuration shipped with the ldmsd rpms. It is 'load' and 'start' freed. The standard location is /etc/genders. For overrides, which are always needed in exotic environments or for finicky site policies, the /etc/sysconfig/ldms.d/ldms.%I.conf location works with ldmsd.service and [email protected].

So assuming you're already aware of it and find it unsatisfactory, what is it you really wanted?

Background on the systemd/genders tooling: features:

  • It’s a pile of bash and a dribble of C++ – don’t be afraid to inspect it
  • Startup process of the ldmsd.service unit:

** ldmsd-pre-system creates the environment file and configuration file for the ldmsd binary

*** /etc/sysconfig/ldms.d/ldmsd loads the .conf file and parses the genders file(s)

**** For aggregators, the ldmsctl_args3 binary analyzes genders file hierarchy to discover the host, transport, and port associations.

***/etc/sysconfig/ldms.d/ldms-functions calls generate the files to place in /var/run/ldmsd

** /usr/bin/ldmsd-wrapper.sh starts the daemon using the generated files

  • Systemd unit files /usr/lib/systemd/system/ldmsd[@].service may need site-specific tuning of resource limits.
  • Escape hatches are included to allow for all exotic use cases – skip genders files entirely if you want to supply your own configuration script.

Design goals:

  1. Simplicity of deployment (a minimum of admin-written files, written quickly following an example)
  2. Scalability (to work at computing center scale)
  3. Do not repeat yourself (to avoid consistency errors)
  4. Make the common uses easy without making the hard uses impossible • New v4 features that have not yet been addressed or explicitly tested in the genders systemd scripts are
  5. Failover
  6. Munge authentication
  7. Storage groups
    

baallan avatar Oct 23 '19 12:10 baallan

There is a practical solution to this which is already available-- use the genders-based systemd ldms configuration shipped with the ldmsd rpms.

So assuming you're already aware of it and find it unsatisfactory, what is it you really wanted?

Yes, I am completely aware of the genders solution. It is what we are currently using. But that solution is a wild abuse of the genders system. Genders is not intended to be the fine grain configuration tool for applications. And in order to shoe-horn it into genders, we need to do a bunch of translations from normal ldms configuration into an intermediate genders language in our heads, which even in the normal simple configuration case looks like line noise in genders, and still requires eight (at the minimum!) other files to be configured and tracked in our configuration management system.

It was a nice experiment. I understand what you were going for. But I think it is really time to retire that approach and go with a much simpler configuration file approach that I suggest in this ticket.

  • It’s a pile of bash and a dribble of C++ – don’t be afraid to inspect it

It is just not reasonable to ask our sysadmins to dive into that. The genders entries look like line noise, and the windy set of scripts is a pain to follow. I don't want to be a system administrator, so I need something that is less unconventional in configuration approach that I can reasonably hand off to the operations and system administrations teams.

Design goals:

  1. Simplicity of deployment (a minimum of admin-written files, written quickly following an example)

I'm sorry, but this goal was entirely missed in the current approach.

  1. Scalability (to work at computing center scale)

We scale the configuration of many other services with similar configuration complexity using much more straight-forward approaches than the one devised for ldms.

  1. Do not repeat yourself (to avoid consistency errors)

But there are at least 4 different places to do the same exact thing! Not repeating yourself is an exercise in extreme restraint given this system rather than enforced by design.

  1. Make the common uses easy without making the hard uses impossible

This makes the common use case difficult.

  1. Failover
  2. Munge authentication
  3. Storage groups
    

All of that seems achievable with a more standard approach to configuration files.

I am happy to iterate on what a good configuration file approach looks like. I hope I'm not being too harsh, but the current genders approach is really not working well for us. We really need to overhaul the configuration approach to bring it more in line with more standard configuration practices.

morrone avatar Oct 23 '19 17:10 morrone

Please suggest an actual declarative configuration syntax and semantics for the '1st class' configuration agent. The current genders implementation represents the minimum needed for ldms v2/v3. I can picture something as easy as 'ldmsd5 --system=$FILE' from the command line. (where if FILE is in the standard place, you don't even need --system).

But what is in the file? A json punctuation of the genders data, where host-expressions still figure prominently in the syntax? XML and configs are written with a LLNL-contributed gui? Are there 'includes' of some sort? How are the escape hatches handled for exotic situations and bug workarounds?

I agree the genders syntax is limiting and the bash scripting is less than entirely pleasant. That's why at a point it drops into c++ to deal with complex queries.

A nearly comprehensive answer to this question would look like attaching to this issue the 'new format' config files you propose to replace the current L0/L1/L2 genders configurations you currently use for an actual LLNL cluster. (manually translated).

baallan avatar Oct 23 '19 17:10 baallan

I don't particularly care at this stage what the specific configuration file format looks like, but I can give some early guidance.

It needs to be something reasonably human readable and editable with vi. That pretty much rules out XML, and anything that would reasonably require a separate GUI to edit it. Something ini-style, yaml, or something along those line would be in line with the desired approach.

Here's a first approximation of what I am thinking. I readily admit that I don't have the full scope of configuration requirements in my head yet, so this will necessarily need to go through some design iteration before it is fully workable. But perhaps it will help to spur more discussion:

[main]
port=444
transport=rdma
authfile=/etc/ldms_auth.conf

[sampler dcgm]
interval=1000000
offset=0
fields=105,115,1000,1001,1005,1006
schema_name=my_favorite_gpu_fields
producer=${hostname}

That might very well be all that is needed for a sampler node, perhaps with more plugin sections. More sections needed for aggregators and storers, of course.

morrone avatar Oct 23 '19 18:10 morrone

v5 configuration files already support all of this.

tom95858 avatar Oct 28 '19 12:10 tom95858

However the current plan is to change the syntax to JSon. Now is the time for anyone with strong opinions on syntax to weigh in.

tom95858 avatar Oct 28 '19 12:10 tom95858

What were the design tradeoffs that led to you preferring to go with json?

oceandlr avatar Oct 28 '19 13:10 oceandlr

"Pros"

  • Human readable
  • Able to represent complex relationships
  • Existing parser for both C, Python, Javascript/NodeJS
  • Suitable as wire protocol

"Cons"

  • Verbose
  • Some people groan when JSon is mentioned

We could certainly have a tool that would generate the JSon from a syntax like what @morrone was suggesting.

Aside from the 'syntax', there is still a lot of 'design' around what the various objects are and how they are encoded.

There is also a notion of how configuration is 'activated', i.e. the configuration is defined as a JSon object, but how is configuration state change done? In today's syntax, we have 'verbs' (start, stop, ...) and 'objects' (prdcr_add, smplr_add, ...) intermingled. start, for example encodes both state change (idle-->start) and configuration (interval, offset). A goal is to split all configuration vs. state so that a complete, idle, configuration can be exchanged with a peer as part of load-balance or fail-over for example.

tom95858 avatar Oct 28 '19 13:10 tom95858

json is, as tom noted, siutable for use as a wire protocol representing binary structures. for ldms we need both more and less than json. i will convert a production genders file to demo.

baallan avatar Oct 28 '19 14:10 baallan

json is a data interchange format, so yes of course it works well as wire protocol. But it does not work well for configuration files. In particular, json has no support for comments! Comments should absolutely be a requirement of the configuration file syntax. Further, json wouldn't make it terribly easy to introduce substitution patterns without ugly escaping, or other advanced configuration file capabilities like includes (which @baallan noted as a possibility). We might not need includes on day one, but it would definitely be nice to leave open the possibility.

Where can I go to learn about this v5 configuration format that you speak of, @tom95858?

morrone avatar Oct 28 '19 17:10 morrone

I share Chris's desire for something that supports comments and easy editing. changing json always results in a lot of { and [ debugging for me. difficult to do diffs also.

oceandlr avatar Oct 28 '19 17:10 oceandlr

Another requirement that just came to mind:

When a bad value is encountered (not just a syntax error) in the configuration file, the daemon needs to be able to return an error saying on what line in which file the error occurred.

morrone avatar Oct 28 '19 17:10 morrone

For use with other tools, we should consider including a requirement that the syntax be trivially mappable to json. I already have a json parser that does line tracking and comment handling. The syntax proposition I have in progress is to accept that for ldms configuration everything is a string and so no double quoting is needed unless we want to protect whitespace or commas. Full example coming, but also somewhat in the spirit of relaxedjson:

cn[1-32] : { # compute nodes config ldmsd: { path : /foo samplers : bar,baz bar_config : "nproc=36 infile=/proc/weirdplace" } } admin1: { ldmsaggd: { clients : cn[1-32] } }

baallan avatar Oct 28 '19 18:10 baallan

I agree with @morrone regarding the lack of comments, and @oceandlr regarding the { and ( and the desire for include. So let's assume that there is a new format. I have a few thoughts:

  • There is a 'save-running-config' capability. Does it have to save the comments it may have read when loading the configuration file? What about includes? Saving all this config-meta-data is painful. My vote is "no"

  • I had imagined a 'templating' capability. The user defines a template and then refers to it plus mods for each object based on the template. For example:

[all-producers]: // this is the producer template obj-type: producer reconnect: 20s connection: active port: 411

[nid0001]: use: all-producers host: nid0001

[nid0002]: use: all-producers host: nid0002

[nid[0003-1234]]: // some kind of generator syntax use: all-producers host: @myname

BTW,

  • we have used YAML as a config file format in the past and my recollection was that it was widely reviled.
  • absent a 'standard' encoding, there won't be syntax highlighting available in vi/emacs
  • 'save-running-config' will produce the fully specified object for each one.
  • There needs to be syntax for moving objects through the state machine. Something like: set nid[0000-1234] state running In YAML perhaps there would be a [action] section that is handled differently

Although there are benefits to the super simple syntax, there are down-sides too: you can't define objects within other objects (nesting) and simple syntax issues like trailing ',' can give you another kind of config debug heartache.

For example:

define updater update_me: interval: auto producers: a,b,c, d, e, f, g, schema: biffle

The configuration parser will be looking for a producer named 'schema'. Your error message will be something intuitive like: line 1234, col 48: producer named 'schema' not found.

followed by: syntax error: expecting keyword, but got ':'

Note that adding comments, include, etc... to our JSon parser is trivial; however, it will then not be parse-able by the compliant parsers like Python. Javascript, however, is very forgiving.

tom95858 avatar Oct 28 '19 19:10 tom95858

... sorry wiki markdown ate my indents...

tom95858 avatar Oct 28 '19 19:10 tom95858

I would like to see the parts where different nodes are configured in the same file removed. I completely see what you are going for, @tom95858. I think in broad terms the problem is that you are trying to reinvent cluster configuration management in ldms, only for ldms. That does not play terribly well with a site that employs a configuration management approach for all services that they are managing on a cluster.

Instead, what we want is a file that tells one service what to do, with templating/substitution supported.

I'll give the LLNL example, but other sites use other configuration management approaches that would benefit from this design as well.

We store configuration files in cfengine, and then use genders to describe which roles and services each node provides. On most of our clusters, we would have two classes of ldms-related services: samplers and aggregators. In cfengine, we would want to have two files, perhaps named: ldmsd_sampler.conf and ldmsd_aggregator.conf.

In cfengine we would then have:

alpha[1-1000] ldmssampler
alpha[1001-1004] ldmsaggregator
alpha1001 ldms_aggregate_from=alpha[1-250]
alpha1002 ldms_aggregate_from=alpha[251-500]
alpha1003 ldms_aggregate_from=alpha[501-750]
alpha1004 ldms_aggregate_from=alpha[751-1000]

Or something to that effect. The great thing here is that now the configuration is replicable across many clusters, not just cluster "alpha". A sysadmin can bring up a new cluster just with a few simple rules in genders, while keeping all of the detailed configuration in the configuration files. Most admins won't need to know much about ldms to get it properly running on a new cluster.

But the admins that do know about ldms and want to change it can edit the appropriate ldms configuration file and have it take effect on next restart everywhere that that class of ldms service runs.

Again, this is just LLNL's cfengine+genders approach to cluster configuration management. But there are a number of other configuration management systems out there. The main point I want to get across is that configuration management should be left to the configuration management system, and the ldms configuration file should just configure one ldms daemon clearly and concisely.

morrone avatar Oct 28 '19 20:10 morrone

I am going to suggest using TOML as the base configuration language:

https://github.com/toml-lang/toml

morrone avatar Oct 28 '19 20:10 morrone

@morrone I like the idea of simplifying genders (if needed at all) down to something indicating which ldmsd instance one wants launched (ldmsd collector, ldmsd agg, etc), but there is a devil in the details. The aggregators (and the bits that monitor them) need to be able to discover (without relying on samplers to connect to them for discovery) what they are supposed to be aggregating in terms of (host addr/port/schedule) and what schemas they ought to expect.

A unified file allows for this easily; it's not so obvious (maybe I'm slow) how this is accomplished with one of 'conventional' image management engines.

Also, as a side note, we don't use cfengine and the like on snl production clusters presently. That might be changing so I'm going to see if we can get one of them to weigh in on this thread.

baallan avatar Oct 29 '19 01:10 baallan

by-the-by I see no reason a change like this couldn't be backported to work with v4/systemd. I doubt it has any implications on the existing command-line language or wire protocols that couldn't be handled with an appropriate preprocessor (much as genders is handled now, or maybe rather more simply).

baallan avatar Oct 29 '19 01:10 baallan

The aggregators (and the bits that monitor them) need to be able to discover (without relying on samplers to connect to them for discovery) what they are supposed to be aggregating in terms of (host addr/port/schedule) and what schemas they ought to expect.

We would just have a configuration file for the aggregators. It lists what schemas to collect from which nodes on which ports, where to store the data, etc. There is no need for that configuration file to be shared with the sampler nodes' configuration file.

Configuration files will only be shared among the nodes where simple pattern substitution will account for the differences. For instance, aggregator A will monitor nodes foo[1-100], aggregator B will monitor nodes foo[101-200]. That node list can be substituted at launch time from genders or another configuration management approach. I will work on mocking up an example aggregator later today.

morrone avatar Oct 29 '19 16:10 morrone

The aggregators and samplers need to agree on port assignments, transports, and schemas, which is complicated enough to be easy to mismatch if they are defined separately. But perhaps if we get something sensible together for each node class, a ldms-config-lint could be applied to generate warnings about unmatched ports/transports and unaggregated schema when given a set of node class files.

baallan avatar Oct 29 '19 17:10 baallan

Having to coordinate things like ports, transports, etc. is not complicated. It is the sort of thing that system administrators do all the time for any number of services running on a cluster.

Putting all of the configuration into a single file with sections for specific hosts would reinvent the wheel just for ldms. It would also integrate poorly with existing cluster configuration management practices. That will really just not work for us.

morrone avatar Oct 29 '19 19:10 morrone

@morrone @tom95858 @gentile I added a proposal page related to this issue of the sort discussed at LDMSCON, where most agreed that something like PEP (python) was perhaps too much process but we need some formalism to help others digest and form opinions. https://github.com/ovis-hpc/ovis/wiki/Proposal-2

@gentile @brandt I added a summary of our desires for a modicum of process at https://github.com/ovis-hpc/ovis/wiki/Proposal-1. Please revise and extend anything I may have missed.

The far bottom right of the index bar on our main page links to the proposal list https://github.com/ovis-hpc/ovis/wiki/OVIS-Change-Proposals

baallan avatar Oct 29 '19 21:10 baallan

I created TOML versions of two of @baallan's "relaxed json" examples. It think I would suggest more changes in the end, but at least they server as a general example of what the files could look like in TOML.

I don't have access to add to wiki pages, so here they are:

p2-local.admin.toml-v0.txt p2-agg.admin.toml-v0.txt

Also, I would suggest another requirement for the new config format:

  • All options should employ full english words where reasonable. For example: Instead of "dbg" use "debug", and instead of "xprt" use "transport".

morrone avatar Oct 30 '19 00:10 morrone

I have added @morrone's examples to the wiki page. I find the double square bracket notation equivalent of json "x : [ {a},{b} ]" [[x]] a [[x]] b a bit disconcerting, but it should be easy for a parser to issue useful warnings/errors of extra/missing brackets.

baallan avatar Oct 30 '19 02:10 baallan

There are already TOML parser libraries available, so basic syntax like that would not be something that we would need to worry about.

morrone avatar Oct 30 '19 04:10 morrone

Another thing that I think we should embrace is the ".d" configuration directory. Perhaps the default location would be "/etc/ldmsd.d". Red Hat (among others) seems to encourage that approach, and it plays well with configuration management systems.

If we do that, it probably avoids the need for an "include" in the config file language.

It does have some implications for how we would construct the configuration file. For instance, we would probably want to get rid of the list of plugins names, and just rely on the fact that a plugin configuration section exists meaning that we want to use that plugin. In other words, we could drop "plugins" from this:

[samplers]
plugins = [
        "jobid",
        "meminfo",
        "vmstat",
        "procnfs",
        "procstat",
        "procnetdev",
        "sysclassib",
]

An basic use case would be when we have a general set of sampler plugins that we want to run on all nodes, for instance meminfo and vmstat. That could go into one file:

# /etc/ldmsd.d/01-global-samplers.conf
[samplers.meminfo]
with_jobid = 1

[samplers.vmstat]
with_jobid = 1

And then some nodes additionally need the gpu sampler:

# /etc/ldmsd.d/20-gpu-sampler.conf
[samplers.gpu]
with_jobid = 1

And then configuration management decides which nodes see just 01-global-samplers.conf, and which nodes get both 01-global-samplers.conf and 20-gpu-sampler.conf.

Higher level options might either go in the 01-global-samplers.conf, or in a separate file like so:

# /etc/ldmsd.d/00-global.conf
port = 3992

[samplers]
default_interval=1000000
default_offset = 0

It would be up to the administrator's personal preference. People can go as far as they want into splitting up their configurations into different files.

This approach seems to play well with existing standard practices.

The only issue I see so far is the potential loss of the "plugins" list (which wouldn't be as easy to maintain with this approach). But the only reason I have heard so far for needing that list is ordering of some kind. I'm not sure what ordering could be significant there, so maybe that really isn't an issue for 99% of users? And perhaps if there is a real need for the remaining 1%, then we can provide an optional "order = N" parameter under each sampler plugin (other plugin types too) to allow explicit ordering that way.

morrone avatar Oct 30 '19 20:10 morrone

On Wed, Oct 30, 2019 at 2:57 PM Christopher J. Morrone < [email protected]> wrote:

Another thing that I think we should embrace is the ".d" configuration directory. Perhaps the default location would be "/etc/ldmsd.d". Red Hat (among others) seems to encourage that approach, and it plays well with configuration management systems.

If we do that, it probably avoids the need for an "include" in the config file language.

It does have some implications for how we would construct the configuration file. For instance, we would probably want to get rid of the list of plugins names, and just rely on the fact that a plugin configuration section exists meaning that we want to use that plugin. In other words, we could drop "plugins" from this:

Top level options such as:

enabled=false

would be nice. If present, this configuration file is "skipped". This seems needed if we're going with the *.d approach.

[samplers] plugins = [ "jobid", "meminfo", "vmstat", "procnfs", "procstat", "procnetdev", "sysclassib", ]

An basic use case would be when we have a general set of sampler plugins that we want to run on all nodes, for instance meminfo and vmstat. That could go into one file:

/etc/ldmsd.d/01-global-samplers.conf

[samplers.meminfo] with_jobid = 1

[samplers.vmstat] with_jobid = 1

And then some nodes additionally need the gpu sampler:

/etc/ldmsd.d/20-gpu-sampler.conf

[samplers.gpu] with_jobid = 1

And then configuration management decides which nodes see just 01-global-samplers.conf, and which nodes get both 01-global-samplers.conf and 20-gpu-sampler.conf.

Higher level options might either go in the 01-global-samplers.conf, or in a separate file like so:

/etc/ldmsd.d/00-global.conf

port = 3992

Not a huge fan of the separate file. "Groups" of files for a particular configuration may have different defaults.

[samplers] default_interval=1000000

I don't think the *default_ *prefix is necessary. It is obvious from the section in which it is found. It would be nice to support a suffix like 's', 'ms', etc... so that the above becomes:

interval=1s

default_offset = 0

It would be up to the administrator's personal preference. People can go as far as they want into splitting up their configurations into different files.

This approach seems to play well with existing standard practices.

The only issue I see so far is the potential loss of the "plugins" list (which wouldn't be as easy to maintain with this approach). But the only reason I have heard so far for needing that list is ordering of some kind. I'm not sure what ordering could be significant there, so maybe that really isn't an issue for 99% of users? And perhaps if there is a real need for the remaining 1%, then we can provide an optional "order = N" parameter under each sampler plugin (other plugin types too) to allow explicit ordering that way.

If the ordering matters, then the configuration and/or our design is broken.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ovis-hpc/ovis/issues/67?email_source=notifications&email_token=ABVTPXHMTALG55JPMWUSUULQRHYMHA5CNFSM4JDS2AWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECVXXPY#issuecomment-548109247, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVTPXAYIX33PX7JZ6WHCRTQRHYMHANCNFSM4JDS2AWA .

-- Thomas Tucker, President, Open Grid Computing, Inc.

tom95858 avatar Oct 31 '19 13:10 tom95858

While I like the idea that plug-in order does not matter in principle, it's not impossible for plugin libraries to expect other plugin libraries to be loaded (and maybe even configured) first in order to function correctly. We have fishiness in that regard around job samplers.

We could document a requirement for all plugins to load and act independently, with log errors about 'waiting for this other data source to appear' and eventual correct behavior when everything needed has been loaded. At least in the past when you loaded job sampler last and started other samplers before it, they would get misconfigured as having no job data.

baallan avatar Oct 31 '19 14:10 baallan