armi icon indicating copy to clipboard operation
armi copied to clipboard

Add more forensic information to the Database metadata

Open youngmit opened this issue 5 years ago • 8 comments

The Database3 class stores some metadata in the HDF5 attributes of the file. Things like user, date, framework version, etc are included, but there is some lacking information for getting a full picture of what the deal was with a given run. At the very least, we need to add Application name and version. Anything else?

youngmit avatar Oct 20 '20 16:10 youngmit

My assumption is that this is the block of code we are talking about, and it looks like it already has the basics:

https://github.com/terrapower/armi/blob/master/armi/bookkeeping/db/database3.py#L563

Though, as you say, we could grab the App subclass name starting at:

  • armi.getApp()

The only things that immediately jump out at me are maybe we could grab information about the OS:

  • sys.platform # just "linux"
  • os.name # posix
  • platform.system() # Linux
  • platform.release() # 4.15.0-140-generic

Is this code usually run from within a Git repo? If so, we can attempt to grab the current hash, in case someone is running off of a real release, with their own modifications:

  • subprocess.check_output(["git", "describe"])

Personally, I would grab the self._openCount from the class as well.

Is there anything else you can think of you'd like on there?

john-science avatar Apr 14 '21 21:04 john-science

For the git version, would using a package like setuptools_scm be appropriate? It embeds the output of git describe into the package distribution pretty seamlessly in my experience. Then, the output of armi.__version__ might be something like 0.1.6-41-g3e7ac5a indicating the most recent tag (0.1.6), number of commits ahead of the tag, and the short hash of the commit (3e7ac5a)

The downsides I see with using this as a way to get the revision are

  1. It doesn't seem to provide an easy way to extract just the commit (without parsing the __version__ string), let alone the full ~40 characters to specify the full commit
  2. If you're exactly on a git tag, then the version string is just the tag, e.g., 0.1.6 with no commit data

Maybe also the hostname via os.uid().hostname?

drewejohnson avatar Apr 15 '21 15:04 drewejohnson

@drewejohnson I like your hostname idea. That's an easy win.

For the commit hash; my original concern was that when someone runs ARMI they might do so on their own dev branch, not a release version. And to debug a run, they will need some way to identify where their workspace was at run time.

Do users typically run the code from within the Git repo folders?

  • If Yes, then the above concern is easy: we just the Git Python library to grab the current branch name and commit hash.
  • If No, then we have to include the commit hash somewhere in the distribution, like you were saying.

john-science avatar Apr 15 '21 16:04 john-science

Hi @theJollySin! I think those would be valuable. The main things that I had in mind when i made the ticket were the Git information in the case that the user is on an unreleased/un-tagged/development version of the code, and perhaps some more details about the plugins that are enabled.

@drewejohnson I hadn't actually come across that package; I'll have to check it out!

The sticky thing with the plugin info is that ARMI doesnt make any prescriptive rules about how exactly plugins should be laid out. Internally, we use git submodules under the app repository, each of which are expected to be pip install -e .d. A more stable application may expect the plugins to be properly installed in the host python environment. If all of the plugins are doing something like @drewejohnson suggests, stuffing all interesting/relevant info into __version__, then we can just use those, but im not sure how well this would work with all the ways that we install plugins.

I'll do some experiments to see how reasonable an expectation it would be for all plugins to be able to express everything that we would want them to (even in development situations) through __version__. If that seems tractable, then it ARMI could just loop over all of the App's registered plugins and peel their __version__s and put them somewhere in the database metadata. Otherwise we may need to consider an App.describe() function that would be responsible for implementing this.

youngmit avatar Apr 15 '21 16:04 youngmit

Yeah, the more I'm thinking about this, the more I seems that a lot of the implementation details about things like plugin versions is really the prerogative of the specific App. For one, there isn't a guaranteed 1:1 mapping from a python package/version to a given plugin, so enumerating the plugins to get their "versions" gets a bit fraught.

Having an App.describe()->str: function seems like kind of the way to go. ARMI can call it and stuff whatever it gets from the App instance into the db attrs. Seem reasonable?

youngmit avatar Apr 15 '21 17:04 youngmit

@youngmit Sorry, I couldn't tell how to grab a list of all the plugins.

I like App.describe(). If the plugins are external executables, I would also vote for grabbing their absolute paths.

john-science avatar Apr 15 '21 19:04 john-science

@youngmit I like the idea of App.describe, seems simple enough. Would it be useful to return a dictionary mapping plugins exposed by the app to their versions? Something like {"ARMI": "0.1.6", "App": Y, "SomePhysicsKernel": X}. This, combined with a tool that pins the revision number in the package version (for armi and other kernels as submodules maybe) could be very useful for tracking down what combination of tools led to a specific result

and kind of related to what @theJollySin, I'm not sure the TerraPower process, but my workflow is usually something like

  1. Checkout a specific branch or revision in ARMI as a submodule for an in-development ARMI-app
  2. Checkout a specific branch or revision in one or more physics tools
  3. Install locally with pip install .
  4. Move to some other directory containing settings and blueprints files.
  5. Run the ARMI app

It's not unreasonable to think that the settings and blueprint files may also be under version control (like the FFTF models in https://github.com/terrapower/fftf-isothermal-model) but that might be a whole new can of worms

drewejohnson avatar Apr 15 '21 19:04 drewejohnson

@youngmit Sorry, I couldn't tell how to grab a list of all the plugins.

I like App.describe(). If the plugins are external executables, I would also vote for grabbing their absolute paths.

That was another thing that was in the back of my mind when I made the issue!

Plugins are instances of some class derived from ArmiPlugin. They are all registered with the applications plugin manager object, which is implemented by the pluggy library. So looping over all of those and taking something like sys.modules[armi._app.__class__.__module__].__file__ is something that the framework can do on its own. That would at least allow us to record where the plugin code came from.

However, which version of some maybe-repo package that provided that plugin is kind of a harder problem. Short of coming up with some standardized place to put such information for a plugin, it seems like the app kind of just needs to know where its plugins are coming from and do whatever it thinks makes sense to get that sort of information. @drewejohnson we could probably come up with something on the ARMI side to try to handle this, but I worry that it might become a little too magical (or have to in order to support all possible cases).

Going back to @theJollySin's question above:

Do users typically run the code from within the Git repo folders?

We actually do both! When running locally, the code (ARMI itself and all of our plugins) is all in their respective Git repos (again, we typically install all of our plugin submodules with pip install -e). When we run on the cluster, we do a lot of sausage making and copy the result up to a network drive. As @theJollySin was saying, to maintain a sense that the code came from some development version of whatever repo, we would need to embed that information somewhere when we make the distribution (an entirely reasonable thing to expect plugins to do).

So yeah, I think getting the locations of all of the Plugin classes is pretty reasonable, because i can see this working in pretty much all cases; if you were able to register it, then you must have imported it, so it's in a module, for which almost always you can get the underlying file path. ARMI could handle that. Mapping plugins to packages seems harder. At least I'm not seeing a sufficiently standard way to do so. Any ideas there?

youngmit avatar Apr 15 '21 20:04 youngmit

The metadata tag in our settings file is being removed. In the 3 years since this ticket was opened, it only ever had one thing in it: ARMI version. And this was entirely unused, as "metadata" is not readable through yamlize.

We are greatly expanding the version tooling in our settings files, and removing metadata forever.

john-science avatar May 24 '23 18:05 john-science