robot icon indicating copy to clipboard operation
robot copied to clipboard

feature idea: add pre-commit hook for robot report

Open jclerman opened this issue 1 year ago • 6 comments

I use robot in my CI workflow to run tests against my ontology, via robot report. For me, those tests run quickly, but I forget to run them before pushing to a branch, and so then the CI workflow (that's triggered upon a push, in my set-up) fails when robot finds a violation.

It'd be great to have a way to call robot via pre-commit, so that I could catch such problems at git commit time instead of after a push. So...

I was thinking that a "script" (https://pre-commit.com/#script) hook would work well enough (it'd require that robot is pre-installed in the environment, but that's easy enough to check for); I have the script drafted already actually. The way pre-commit works, such hooks need to be retrievable from a git repository. The most appropriate place would be this repo (robot's).

If this'd be of interest I could probably submit a PR - let me know?

jclerman avatar Sep 14 '22 21:09 jclerman

@jclerman I would be interested just to see how this works. In ODK land we usually:

  1. protect master against commits from anyone
  2. And run CI testing as part of a github action as soon as a PR is made

This has the advantage that the person making the commit does not have to have the environment configured (robot or odk installed). For our ontologies projects the main problem is that we have linux, mac and windows contributors, all of which have to set up their environments somewhat differently.

All that said, I would still like to see how this works. Can you explain a bit what needs doing to enable precommit hooks (on client side), and how such a hook looks like?

matentzn avatar Sep 15 '22 08:09 matentzn

Our usual practice is similar to what you describe, with some additions. We also protect master against commits, except for merges, and merges are blocked unless they have a certain number of approvals (and unless CI tests have passed).

Regarding pre-commit: We generally have a set of pre-commit hooks configured as part of the project; the configuration is done by including a .pre-commit-config.yaml in the project's root directory (we do check that file into git). The hooks configured in that file are run at least once; maybe twice:

  1. (optionally) before a commit; whether they run then depends on whether pre-commit has been installed in the developer's working environment via pre-commit install.
  2. Our CI process (we use CircleCI, but it's analogous to github actions) is configured to, among other things, run pre-commit run --all-files (we do that upon every push to a branch). That runs the hooks against everything of course, whereas the optional step 1 would only run hooks against files changed in the commit.

Pre-commit hooks must be defined somewhere and available from a git repo; then they can be used by anyone. So let's say I want black to run against the python code in my repo to ensure that it's consistently formatted; I could include this in my .pre-commit-config.yaml in my repo:

repos:
  - repo: https://github.com/psf/black
    rev: 22.3.0
    hooks:
      - id: black
        language_version: python3.8

The black maintainers have a file .pre-commit-hooks.yaml at the above-referenced repo, which defines the "black" hook I referenced above; the definition looks like this:

- id: black
  name: black
  description: "Black: The uncompromising Python code formatter"
  entry: black
  language: python
  minimum_pre_commit_version: 2.9.2
  require_serial: true
  types_or: [python, pyi]

jclerman avatar Sep 15 '22 19:09 jclerman

@jclerman this all sounds pretty amazing. Would you be able to share with us the projects you have implemented this?

(optionally) before a commit; whether they run then depends on whether pre-commit has been installed in the developer's working environment via pre-commit install.

Ok so the presence of the pre-commit config does not preclude contributors in an environment that is not set up for running the CI to make a commit / push to a branch. That is good. It is basically a feature a developer may chose to install to protect themselves.

How would the pre-commit stuff for ROBOT look like roughly? We need very different ROBOT QC checks for every ontology (we support more than 60 individual ontologies, most of which with some form of CI).

matentzn avatar Sep 16 '22 08:09 matentzn

Unfortunately all of the projects I've implemented this on so far are private, visibility limited to my employer. To be clear, I haven't set up a ROBOT pre-commit hook yet - that's what this ticket is about :). But I may have drafted code for that :).

In my CI flow, I invoke ROBOT report a couple times: Once against the version of the ontology that I edit directly, and once against the "reasoned" version of the ontology (after I've used ROBOT to infer things like superclasses, and have written the resulting updates into a "final ontology" build-artifact).

For the pre-commit hook, I'd expect to run ROBOT report just against that first, user-edited (source-code?) version of the ontology, since that's all that's available at commit-time.

The hook itself can be configured, in each ontology-repo's .pre-commit-config.yaml, in a number of ways including an arbitrary list of arguments to pass to the hook - so for example, the argument to --profile can be specified in the hook configuration. So here is what I imagine:

.pre-commit-hooks.yaml (in the root of the ROBOT repo)

-   id: robot-report
    name: ROBOT report
    description: This hook runs ROBOT (https://robot.obolibrary.org/) report to check for ontology errors.
    entry: bin/robot-report.bash
    language: script
    files: \.(owl|OWL)$

bin/robot-report.bash (in the ROBOT repo - this location can vary but should match the path above)

#!/bin/bash

if ! command -v robot &> /dev/null
then
    echo "robot could not be found. Install it per https://robot.obolibrary.org/"
    exit 1
fi

TARGET_FILE=$1
shift
REMAINING_ARGS=$@

robot report --input $TARGET_FILE $REMAINING_ARGS

And then in a given ontology-repo's root, we'd have: .pre-commit-config.yaml

repos:
  - repo: https://github.com/ontodev/robot
    rev: v1.9.0
    hooks:
      - id: robot-report
      args: [--profile tests/content/robot-report-profile.txt]

Of course, additional args could be included too. I am not sure if something special needs to be done to tell pre-commit where to find the root of the repository (the location of tests/content/... above) or if that's already the working directory - I'd need to experiment.

jclerman avatar Sep 17 '22 01:09 jclerman

Very interesting! I am not the owner of the ROBOT repo, but my main concern is that for most ontologies, a high level of customisation is needed. Also, pre-commit hooks seem very platform dependent. Nevertheless, the idea should be explored. Is it possible to try this on one of your own repos first and come back sharing your experiences? I can imagine quite a few hooks that are useful (validate-profile). Also (tangentially) it would be nice to hear how you plan to set up all client machines to be able to execute the hook. Would be kinda awesome to have brew install robot :P

matentzn avatar Sep 18 '22 13:09 matentzn

I like the idea of pre-commit hooks, and I set them up for myself sometimes. The real benefit comes when every contributor runs the pre-commit hooks reliably. The https://pre-commit.com/ tool streamlines the setup as much as possible, I guess, but it's still an extra step and an extra tool to install. So I don't use it on my projects. I've found that GitHub Actions and similar CI on the repository side is more reliable. If people mess up their first commit, CI should catch it, and maybe you squash the merge to make the history cleaner.

The proposal here isn't asking for much. I'm not completely opposed to it, but I am hesitant. It's one more top-level file adding clutter, and one more thing I'll have to support into the future. As you can see, there are many other open issues that seem higher priority to me, and out resources are quite limited.

If this were one-size-fits-all, and if ODK was going to use it, then that would be persuasive. But as @matentzn points out, I expect most projects will want customization, and I know most of our projects use GitHub Actions to run robot report.

jamesaoverton avatar Sep 19 '22 13:09 jamesaoverton