gitlabform icon indicating copy to clipboard operation
gitlabform copied to clipboard

chore: add json-schema for configuration syntax

Open amimas opened this issue 2 years ago β€’ 11 comments
trafficstars

This PR adds a JSON Schema for gitlabform configuration syntax as described in #499 . Few things to note about this schema:

The schema is manually created as opposed to generating from gitlabform build. I don't know if there's anything in-place to be able to auto-generate it. That may make it too complicated and hand-crafting the schema is probably good enough since gitlabform passes most of the configuration directly to gitlab api as parameter as opposed to writing special config syntax.

In this PR, only top-level syntax's schema has been added. For example:

config_version: 3

gitlab:
  ssl_verify: true
  timeout: 10
  token: blah-23tlws623xwett
  url: gitlab.my-domain.com

skip_groups:
  - group1/subgroupA
  - group2/*

skip_projects:
  - group3/project1

Goal is to merge this PR with the above config's schema so that we have a starting point. That way others can also contribute or we can review changes in smaller chunks easily.

amimas avatar Feb 23 '23 02:02 amimas

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.46%. Comparing base (e6102b6) to head (6caf601). Report is 345 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #504      +/-   ##
==========================================
- Coverage   83.50%   83.46%   -0.04%     
==========================================
  Files          70       70              
  Lines        2746     2746              
==========================================
- Hits         2293     2292       -1     
- Misses        453      454       +1     

see 3 files with indirect coverage changes

codecov-commenter avatar Feb 23 '23 02:02 codecov-commenter

Hey, this looks good @amimas and I am totally for the incremental approach.

Can we just add some code to actually test if the test configs that you added are matching the provided schema?

gdubicki avatar Feb 24 '23 10:02 gdubicki

Thanks @gdubicki. I was using that test file in ny editor to try out the schema. I can look into adding automated tests but I'm kind of waiting on the decision of where it will be hosted.

I left some comment in the linked issue. If we host the schema under gitlabform domain, we'll need to implement the necessary test framework. If it is hosted in the schemastore project, I believe the test framework is already setup and we'll just need to add test cases.

Let me know how we should proceed.

amimas avatar Feb 25 '23 00:02 amimas

I don’t have experience with JSON/YAML schemas, @amimas. Let’s do it the way you recommend, so it will be better for the users. So what would that be? 😊

Of course I can help with creating the necessary deployment flow for the schema.

gdubicki avatar Feb 25 '23 09:02 gdubicki

@gdubicki @amimas quick addition from me as I've done something similar at work.

You can use the jsonschema package (https://python-jsonschema.readthedocs.io/en/stable/) to validate some example config files and expect success/failures. An example might look like this, of course you'll turn this into test cases and it'll need some tweaks.

from pathlib import Path

import jsonschema
import jsonschema.exceptions
import jsonschema.validators


# YOUR_SCHEMA: path to your schema
# YOUR_CONFIG: path to your config

with open(Path.cwd() / YOUR_CONFIG, "r") as f:
    config = yaml.safe_load(f)

resolver = jsonschema.validators.RefResolver(
    base_uri=f"{Path.cwd().as_uri()}/", referrer=True,
)
jsonschema.validate(
    instance=config, schema={"$ref": YOUR_SCHEMA}, resolver=resolver,
)

This also works really well with Red Hat's yaml extension for vscode by using the following in target files, so you could document that if you like.

---
# yaml-language-server: $schema=https://raw.githubusercontent.com/gitlabform/gitlabform/main/schema/gitlabform-v3.json
# ... start gitlabform config

I've only tested it with local files as we have them in the same repo but it should work. See: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml#associating-schemas

Edit: come to think of it, you could distribute the current schema with the package and create a validate subcommand for gitlabform. But that's probably a follow-up

nejch avatar Mar 05 '23 19:03 nejch

Hi @gdubicki . Sorry this PR have been stale. I got a little busy lately and didn't get time to continue. I'll try and pick it up again.

Thanks for your input @nejch. This is my first time developing a json schema for something. Based on your experience, would you recommend that we host the schema in this repo and also under https://gitlabform.github.io/gitlabform/ website? I was thinking of hosting in in the https://github.com/SchemaStore/schemastore repo. One benefit comes to mind is that, I assume, that repo already has necessary basic validation and CI CD in place; we don't need to re-invent it here. What's your thought?

amimas avatar Jun 04 '23 16:06 amimas

I think if you host in-repo that leaves you with more flexibility. It is what GitLab do with they own CI schema

In schemastore you can add it as a link to the source-repo (again like GitLab have) so you still get all the automatic IDE benefits

I don't think you need to host it in the mkdocs site, probably just a comment it is available in the repo?

adam-moss avatar Aug 02 '23 16:08 adam-moss

Thanks for the suggestion @adam-moss . I was starting to lean towards hosting in-repo as well. Glad to see the gitlab ci schema example. I should've checked it earlier too. Now I just need to get back to this PR... :)

amimas avatar Aug 02 '23 20:08 amimas

Have been learning more about jsonschema, especially around automated schema validation using python-jsonschema library.

The PR is now ready for review and merge @gdubicki . But, unfortunately python 3.7 stands in the way now. The above library doesn't support 3.7. That's why the python 3.7 related tests are currently failing because the library cannot be installed.

Aside from that, here're the major changes I made from my initial setup:

Split into independent schema file for different portion of configs (i.e. gitlab, skip_groups, skip_projects, etc.). They are all then used as a reference into the main schema file (schemas/src/gitlabform.json) to compose the overall schema of gitlabform configuration syntax. I believe this allows a lot more flexibility for testing and validating the schema more rigorously. It should also help to maintain the schema easily because you won't be dealing with one giant json file.

The "ID" of each schemas are hardcoded to where they will be available as a raw file from github. For example: For schemas/src/gitlabform.json schema, the ID is set to https://raw.githubusercontent.com/gitlabform/gitlabform/main/schemas/gitlabform.json. Note that this doesn't mean unit testing or local development will be using that copy of the file. It's simply a "URI" that is resolved via the referecing library. I used the above ID (and similar for other schemas) because that's what will be submitted to schemastore catalog. We could use a different ID, but then we'll have to introduce extra build process into this project that modifies schema file with IDs that is a resolvable URL. We'll then have to figure out how to distribute that updated schema files (i.e. github release, gitlabform docs site, etc.). I took the easy path πŸ˜„ .

And finally, unit testing the schemas themselves. I hope the test files are self-explanatory. I did add a pytest plugin though. For example: added pytest-dependency plugin so that validating sample json/config with a schema is only done if the test for validating the actual schema itself is successful. And for validating schemas with sample json/configs, I used parametrized tests.

When you get a chance, please take a look. I hope this will set the foundation for others to help build rest of the schemas.

amimas avatar Aug 18 '23 03:08 amimas

I just went through all of the code in this PR and I am supper impressed. Thank you, @amimas, for this great contribution! 😊

I would be happy to approve but I just have a little doubt about how will this handle versioning of gitlabform.

The schema at https://raw.githubusercontent.com/gitlabform/gitlabform/main/schemas/gitlabform.json will - in the best case scenario, if we release a new app version on every schema files updates - only match the latest app version, but users sometimes stick to the older versions for a while.

Wouldn't that be a problem?

gdubicki avatar Sep 08 '23 08:09 gdubicki

I would be happy to approve but I just have a little doubt about how will this handle versioning of gitlabform.

The schema at https://raw.githubusercontent.com/gitlabform/gitlabform/main/schemas/gitlabform.json will - in the best case scenario, if we release a new app version on every schema files updates - only match the latest app version, but users sometimes stick to the older versions for a while.

Wouldn't that be a problem?

Hi @gdubicki . That's a good question. I think it depends on how this schema will be used.

My proposal is to provide this schema so that gitlabform users get auto-completion, hints, validations, etc. in their editors when composing the config file. For this, 2 things are needed:

  • Make the schema files available to the users
  • Configure IDE/editor setting to use that schema file with certain files

This is usually simplified by making the schemas available through SchemaStore because major IDEs and/or their yaml extensions natively support it. So, making this schema available into that schemastore will be the next step after this PR is merged.

We can create new version of schema for every release or every major version of gitlabform. But the issue is that schemas are associated with filenames. That's what allows the editors to give the above mentioned functionalities. So, when we make the schema published into the above SchemaStore catalog (or even if we manually add it to the editors), what filename will we reference? I'm thinking of something like *.gitlabform.yml or *.gitlabform.yaml. Now bringing in the versioning topic, we can specify version numbers to be in the file name (i.e. *.gitlabform-v3.yml). But, gitlabform doesn't require special file name as long as it's a yaml file (i.e. config.yml), right? If gitlabform expected config file name matching the exact version (i.e. config.gitlabform-v3.7.0.yml), I think it would not be convenient to upgrade as users will need to rename their config files during every upgrade.

So, I think it's okay that out schemas are always about the latest version. Let's look at GitLab CI schema or GitHub Action schema. Users can be on different versions of their self-hosted GitLab or GitHub, but they'll see the latest hints/descriptions from the above schema.

We can still publish new schema matching the version of gitlabform. But, users will have to manually configure their editor's settings to consume those schemas. Not sure how many users will do that though πŸ˜„ .

We can also be a little more expressive in the schema. For example:

  • In description, indicate what version of gitlabform they are available from
  • use deprecated to indicate key have been deprecated

That was a long answer to your question. Hope that made sense πŸ˜… . Let me know what you think and if the plan for next step (filenames to be associated with and adding to schemastore catalog) makes sense.

amimas avatar Sep 09 '23 14:09 amimas