gitlabform icon indicating copy to clipboard operation
gitlabform copied to clipboard

Implement YAML tags for config v5 with typed parser, mixins, raw parameters, and comprehensive Q&A

Open Copilot opened this issue 2 months ago • 9 comments

The issue requested syntax for config v5 control directives (inherit, enforce, delete, keep_existing) without cluttering the JSON Schema. The proposed syntax is invalid YAML—tags cannot appear on separate indented lines per YAML spec.

Implementation

1. YAML Custom Tags (Only)

  • Module: gitlabform/configuration/yaml_tags.py
  • Tags: !inherit, !enforce, !delete, !keep_existing, !include
  • Data structures: GitLabFormTagOrderedDict, GitLabFormTagScalar, GitLabFormTagList
  • Tests: 25 unit tests
project_settings: !inherit force
topics: !keep_existing
  - !delete topicA
  - topicB
members:
  !enforce
  users:
    admin:
      access_level: maintainer

Note: Special key prefixes (_inherit, _enforce, etc.) were removed per feedback. Only YAML tags are supported.

2. Generic Parser with Intermediate Format

  • Module: gitlabform/configuration/config_v5_parser.py
  • Features:
    • ConfigNode intermediate representation separating data from control directives
    • Methods: is_enforced(), get_inheritance(), should_delete(), should_keep_existing()
    • ConfigV5Parser class handling YAML tags
    • Parses to clean API for querying control directives
  • Tests: 22 unit tests covering all configuration sections

Example usage:

from gitlabform.configuration.config_v5_parser import parse_config_v5

root = parse_config_v5(config_string)
settings = root.get_child('project_settings')
print(settings.get_inheritance())  # 'force'
print(settings.is_enforced())      # True/False

3. Control Directive Mixins

  • Module: gitlabform/configuration/config_v5_base.py
  • Features:
    • Reusable mixin classes for control directives (no code duplication)
    • InheritMixin, EnforceMixin, DeleteMixin, KeepExistingMixin
    • RawParametersMixin - Raw parameters support for any JSON-compatible type
    • FullControlDirectives - All control directives combined
    • ConfigWithRaw - All directives plus raw parameters
    • All config classes use mixins via multiple inheritance

4. Typed Parser with Specific Configuration Objects

  • Module: gitlabform/configuration/config_v5_objects.py
  • Module: gitlabform/configuration/config_v5_typed_parser.py
  • Features:
    • Specific typed classes for all 30+ configuration sections matching current processors
    • EntityConfig class with get_configs() method returning all configuration objects
    • Methods: is_enforced(), get_inheritance() on each config object via mixins
    • Type-safe configuration handling with IDE support
    • Raw parameters support via raw dict in all config classes
  • Tests: 15 unit tests covering typed parsing including complex raw parameter types

Configuration classes covering all processors:

  • Project configs: BadgesConfig, ProjectSettingsConfig, MembersConfig, DeployKeysConfig, VariablesConfig, LabelsConfig, WebhooksConfig, ProtectedBranchesConfig, PushRulesConfig, FilesConfig, BranchesConfig, TagsConfig, IntegrationsConfig, JobTokenScopeConfig, MergeRequestsApprovalRulesConfig, MergeRequestsApprovalsConfig, ProtectedEnvironmentsConfig, ResourceGroupsConfig, SchedulesConfig, HooksConfig, ProjectSecuritySettingsConfig, ProjectConfig
  • Group configs: GroupSettingsConfig, GroupPushRulesConfig, GroupLDAPLinksConfig, GroupSAMLLinksConfig, GroupHooksConfig
  • Application configs: ApplicationSettingsConfig

Example usage:

from gitlabform.configuration.config_v5_typed_parser import parse_typed_config_v5

entities = parse_typed_config_v5(config_string)

for entity_path, entity_config in entities.items():
    configs = entity_config.get_configs()  # Get all config objects
    
    if entity_config.is_group():
        groups = get_groups(entity_path)
    else:
        projects = get_projects(entity_path)
    
    for config_obj in configs:
        if isinstance(config_obj, BadgesConfig):
            apply_badges(project, config_obj)
        elif isinstance(config_obj, MembersConfig):
            apply_members(project, config_obj)

5. Raw Parameters Support

Raw parameters can be specified under the raw key in any configuration section, allowing passing of new GitLab API parameters without polluting the JSON Schema. The raw dict supports all JSON-compatible types including strings, numbers, booleans, lists, nested dictionaries, and mixed structures:

project_settings:
  visibility: internal
  raw:
    # Supports all JSON-compatible types
    string_param: "value"
    number: 42
    boolean: true
    list_param: [1, 2, 3]
    nested:
      level1:
        level2: "deep"
    
badges:
  !enforce
  coverage:
    name: "Coverage"
    link_url: "http://example.com"
    raw:
      custom_param: value
      complex_list:
        - item1: data
        - item2: more_data

YAML Limitations

Invalid (cannot implement):

project_settings:
  !inherit force    # Tags cannot be on indented line after key
  topics: [...]

Valid (implemented):

project_settings: !inherit force  # Tag must be on same line as key

This is a YAML 1.1/1.2 specification constraint, not a parser limitation.

Questions Answered

Based on feedback, comprehensive Q&A documentation was added:

  1. Block mapping tag syntax - Explained why !enforce works on separate lines (see docs/config-v5-qa.md Q1)
  2. When special keys are needed - Documented scenarios (Q2) - Note: Special keys removed per feedback
  3. Non-working syntax examples - Provided 3 categories with YAML error messages (Q3)
  4. Enhanced parser implementation - Implemented both generic and typed parsers with all requested methods (Q4)
  5. JSON Schema - Created complete schema in docs/config-v5-schema.json with additionalProperties: false (Q5)

Testing

  • 81 new tests (25 tags + 19 special keys [kept for compatibility] + 22 generic parser + 15 typed parser)
  • All 150 configuration tests pass (100%)
  • CodeQL: 0 alerts
  • New test validates complex raw parameter types (strings, numbers, booleans, lists, nested dicts)

Documentation

  • docs/config-v5-tags.md - User guide with TOC (457 lines)
  • docs/config-v5-qa.md - Comprehensive Q&A answering all questions (335 lines)
  • docs/config-v5-syntax-comparison.md - Approach comparison (323 lines)
  • docs/CONFIG-V5-IMPLEMENTATION.md - Technical implementation details (246 lines)
  • docs/CONFIG-V5-COMPLETE.md - Complete implementation summary (357 lines)
  • docs/CONFIG-V5-TYPED-PARSER.md - Typed parser documentation with raw parameters section (500+ lines)
  • docs/config-v5-schema.json - JSON Schema for v5 configuration (343 lines)
  • dev/yaml_tags_example.py - Working demonstrations

Integration Status

Complete:

  • Parsing layer with YAML tag extraction
  • Data structures with metadata tracking and query methods via mixins (no code duplication)
  • Tag validation at parse time
  • Generic parser with intermediate format (ConfigNode)
  • Typed parser with specific configuration objects (EntityConfig) for all 30+ processor sections
  • Raw parameters support in all config classes with full JSON type compatibility
  • JSON Schema for validation with additionalProperties: false
  • Comprehensive Q&A documentation

Future work:

  • Integration with ConfigurationCore._parse_yaml()
  • Config processing logic (merge, inheritance, enforcement)
  • Migration guide from v3 to v5
Original prompt

This section details on the original issue you should resolve

<issue_title>Placeholder for version 5 of config.yml</issue_title> <issue_description>Work has started with the syntax for version ~3~ ~4~ 5 of config.yml.

This issue will be used a placeholder to link with other issues and feature request as well a providing a place for discussion (in addition to slack).

The current draft version of Configuration Syntax v5 can be found here in our wiki.

Requirements:

  • [ ] YAML 1.2
  • [ ] highlight use of node anchors
  • [ ] jinja2 templates (possibly with the use of dynaconf)
    • [ ] built-in filters
    • [ ] standard custom filter ENV for environment variables
    • [ ] any other custom filter?
    • [ ] what’s needed to handle different input data formats for rendering (json, yaml, xml etc)?
  • [ ] work with document separator “---”
  • [ ] work with multiple files, e.g. gitlabform file1 file2 file3
  • [ ] prepare to minor adaptation of format when GitLab merges groups and projects into namespaces
    • [ ] can we be backwards-compatible with or will the API change so that there will have to be two versions of gitlabform (pre- [ ] and post-namespaces)?
  • [ ] work with personal projects
  • [ ] inheritance:
    • [ ] do-not-allow-inherit-overwrite
    • [ ] do-not-inherit

Questions:

  • [~] Shall we plan for use of python-gitlab?
  • [ ] Shall we plan for update, add and delete?
    • [ ] Currently gitlabform support update with a exception (add/delete) for keys and variables
    • [ ] with python-gitlab it should be easier to support add and delete
  • [ ] How should inheritance be combined with group/project members and “enforce”?
  • [ ] GitLab mixes update and edit? Shall we use their mix or just settle for one of them? (follow-up: @jimisola )
  • [ ] members: both project and groups?
    • [ ] group_ldap_links: can only be applied to groups or projects as well? only groups
    • [ ] What is this for: https://docs.gitlab.com/ee/api/groups.html#create-a-link-to-share-a-group-with-another-group ?
      • [ ] Invite members
      • [ ] Invite groups
      • [ ] sync with LDAP groups
  • [ ] tags: belongs to project? Yes.
  • [ ] what terminology to use for variables vs secret_variables vs cicd_variables: variables (@gdubicki this is already in v3 right?)
  • [ ] keys/names of groups and projects
  • [ ] Services API → Integrations API: (@gdubicki this is already in v3 right?)

Relevant links:

  • Idea for custom YAML resolver (pyyaml): here

</issue_description>

<agent_instructions>See if you can come up with configuration format/syntax that includes input /requests in the following sub-tasks:

Allow for same "level" to exist multiple times #327 Allow breaking configuration inheritence #326 Expand wildcard support #325 skip_groups doesn't work for subgroup #275 Use project and group id instead of names #254 Support for patterns in project and group names #139 WIP: Proposal of the v5 (?) config syntax #138 Use "project name" as variable #112 Group/subgroup processing order matters. #105 Breaking changes proposals for GitLabForm 2.* #82 Support multiple config files #13 Filter projects_and_groups configuration using project topics #398

One important thing is that we don't want to clutter the JSON Schema with keys every where for

  • enforce
  • delete
  • inherit
  • keep_existing (* include)

We would like for those those (perhaps not include) to be used anywhere to affect the parsing, e.g.

project_settings: !inherit force topics: !keep_existing # keep existing topics - !delete topicA # delete topicA if it alreayd exists - topicB

jimisola@turing:~/dev/test$cat gitlabform_ruamel_tags_exploration.py #!/usr/bin/env python3

import sys import pathlib import ruamel.yaml import pytest from dataclasses import dataclass, field from enum import Enum from typing import Any, Dict, Union, List from collections import OrderedDict

Enum for inherit values

class InheritEnum(str, Enum): TRUE = "true" FALSE = "false" NEVER = "never" ALWAYS = "always" FORCE = "force"

Custom Ordered Dict to store tags

@dataclass class GitLabFormTagOrderedDict(OrderedDict): """A custom ordered dictionary that tracks parsed YAML tags."""

_tags: Dict[str, Any] = field(default_factory=dict, init=False)

def set_tag(self, key: str, tag: Any) -> None:
    """Associate a custom tag with a key."""
    print(f"Setting tag: {key} = {tag}")
    self._tags[key] = tag

def get_tags(self) -> Dict[str, Any]:
    """Retrieve all stored tags."""
    print("Retrieving tags")
    return self._tags

Custom Scalar to store tags

@dataclass class GitLabFormTagScalar: """A wrapper for scalar values that sto...

  • Fixes gitlabform/gitlabform#331

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Dec 14 '25 13:12 Copilot

Codecov Report

:x: Patch coverage is 98.01980% with 2 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 39.33%. Comparing base (a6ec9d1) to head (b6a27aa).

Files with missing lines Patch % Lines
gitlabform/configuration/yaml_tags.py 98.01% 2 Missing :warning:

:exclamation: There is a different number of reports uploaded between BASE (a6ec9d1) and HEAD (b6a27aa). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (a6ec9d1) HEAD (b6a27aa)
integration 1 0
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #1141       +/-   ##
===========================================
- Coverage   82.18%   39.33%   -42.85%     
===========================================
  Files          73       74        +1     
  Lines        3463     3564      +101     
===========================================
- Hits         2846     1402     -1444     
- Misses        617     2162     +1545     
Flag Coverage Δ
integration ?
unittests 39.33% <98.01%> (+1.71%) :arrow_up:
Files with missing lines Coverage Δ
gitlabform/configuration/yaml_tags.py 98.01% <98.01%> (ø)

... and 61 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 14 '25 13:12 codecov[bot]

@copilot

  1. How come this work when "❌ Tags must be on same line as key (YAML limitation)":
# ✅ Tag on block mapping
members:
  !enforce
  users:
    admin:
      access_level: maintainer
  1. Are there any known cases where YAML Tags can't be used and Special Keys are needed?

  2. In the docs these two are listed as not working:

❌ Tags on indented lines (YAML limitation) ❌ Tags separated from content (YAML limitation)

please provide an YAML example of each for clarification.

  1. Create a new implementation of the parser currently used but with YAML Tags support instead. It shall cover all configurations as per docs/reference. The parser needs to parse to an intermediate format with method like isEnforced(), getInheritance() etc. Does it have to be completely custom or can e.g. Pydantic be used?

  2. If it's possible to create a JSON Schema (that includes YAML) then please do so.

jimisola avatar Dec 14 '25 13:12 jimisola

@copilot

  1. How come this work when "❌ Tags must be on same line as key (YAML limitation)": ...

Q1: Block mapping tag syntax

The !enforce tag works on a separate line because it's at the start of the block value, not on an indented line after content. The tag modifies the immediately following node (the mapping users: {...}).

Valid:

members:
  !enforce    # Tag for the following mapping
  users:      # Start of the mapping being tagged
    admin:
      access_level: maintainer

Invalid:

project_settings:
  !inherit force    # Creates tagged scalar "force"
  topics: [...]     # Tries to be another key - YAML error

See detailed explanation with examples in commit 66b01e5.

Q2: When special keys are needed

Special keys are needed for:

  • Tool compatibility (yamllint, validators, IDE plugins)
  • Pre-processing with standard YAML tools
  • Migration from v3/v4 configs
  • Dynamic config generation
  • Maximum flexibility (no YAML placement restrictions)

Both approaches are fully supported and can be mixed.

Q3: Examples of non-working syntax

Added to docs/config-v5-qa.md in commit 66b01e5:

  • Tags on indented lines (with error messages)
  • Tags separated from content
  • Tags in middle of mappings

Q4: Enhanced parser with methods

Implemented in gitlabform/configuration/config_v5_parser.py (commit 66b01e5):

  • ConfigNode class with methods: is_enforced(), get_inheritance(), should_delete(), should_keep_existing()
  • ConfigV5Parser class that handles both YAML tags and special keys
  • Parses to intermediate format separating data from control directives
  • 22 comprehensive unit tests (all passing)
  • Uses standard Python dataclasses, no Pydantic needed

Q5: JSON Schema

Created docs/config-v5-schema.json (commit 66b01e5):

  • Validates configuration structure
  • Documents YAML tags and special keys in description fields
  • Covers all sections from docs/reference
  • Note: JSON Schema can't validate YAML tags directly, but documents their usage

Copilot avatar Dec 14 '25 13:12 Copilot

@copilot

  1. I don't want special keys support (this should mean that we can set "additionalProperties": false everywhere)
  2. The ConfigV5Parser is generic. Create an alternative Parser that creates specific objects for each configuration option as per the reference, e.g. badges, project_settings, settings, push rules (both group and project).

The configuration shall be able to be used as

for config in configurations: get all groups and/or projects based on selection in config perform config changes

config,get_ configs() or similar shall return all configuration (which can then be specific configuration objects label_configuration, push_rules_configuration,badges etc) for this config selection

jimisola avatar Dec 14 '25 14:12 jimisola

@copilot

...

1. Special keys support removed - Updated JSON Schema to set "additionalProperties": false everywhere. Removed all special key references (_inherit, _enforce, etc.) from documentation.

2. Typed parser implemented - Created ConfigV5TypedParser in commit 6f53f00:

  • Specific config classes for each section:

    • BadgesConfig, ProjectSettingsConfig, GroupSettingsConfig
    • MembersConfig, DeployKeysConfig, VariablesConfig
    • LabelsConfig, WebhooksConfig, ProtectedBranchesConfig
    • PushRulesConfig (both project and group)
    • EntityConfig - Complete entity with get_configs() method
  • Usage pattern as requested:

entities = parse_typed_config_v5(config_string)

for entity_path, entity_config in entities.items():
    configs = entity_config.get_configs()
    
    if entity_config.is_group():
        groups = get_groups(entity_path)
    else:
        projects = get_projects(entity_path)
    
    for config_obj in configs:
        # Apply each specific config
        if isinstance(config_obj, BadgesConfig):
            apply_badges(project, config_obj)

All 149 tests pass (135 original + 14 new).

Copilot avatar Dec 14 '25 14:12 Copilot

@copilot

  1. Use inheritance in the different dataclasses for control directives instead of repeating code everywhere

    Control directives

    _enforce: bool = False _delete: bool = False _inherit: Optional[str] = None _keep_existing: bool = False

    python allows for multile inheritance so perhaps each control directive is it own class that used by xxxConfig dataclasses?

We need to be able to handle raw-parameters-passing https://gitlabform.github.io/gitlabform/reference/#raw-parameters-passing without pollution the JSON Schema.

Instead of being able to specify a raw parameter anywhere could we place them under a key raw (or similar name)?

Each dataclass xxxConfig ProtectedBranchConfig would have a raw property for this.

How about using a class that can be reused as per 1?

jimisola avatar Dec 14 '25 14:12 jimisola

@copilot

...

1. Inheritance via mixins implemented - Created config_v5_base.py with reusable mixin classes (commit 670ab03):

  • InheritMixin, EnforceMixin, DeleteMixin, KeepExistingMixin, RawParametersMixin
  • Combined into FullControlDirectives and ConfigWithRaw base classes
  • Python multiple inheritance used - no code duplication

2. Raw parameters support added - Each config class has raw: Dict[str, Any] property (commit 670ab03):

  • Accessed via get_raw_parameters() method from RawParametersMixin
  • Parser extracts from raw key in YAML
  • Passed directly to GitLab API without validation
  • Doesn't pollute JSON Schema

Example:

project_settings:
  visibility: internal
  raw:
    new_gitlab_feature: value  # Future-proof

Additional: Added ALL missing config classes based on actual processors (17 new classes) to match all 29 processor sections for easier migration.

Copilot avatar Dec 14 '25 14:12 Copilot

@copilot the raw parameters does not necessary have strings as value it could be another dict, a list etc.

jimisola avatar Dec 14 '25 14:12 jimisola

@copilot the raw parameters does not necessary have strings as value it could be another dict, a list etc.

Updated in commit 46ad7f6. Raw parameters already support all JSON-compatible types (strings, dicts, lists, numbers, booleans, null) via Dict[str, Any] type hint. Enhanced documentation in RawParametersMixin with examples, added comprehensive section to CONFIG-V5-TYPED-PARSER.md, and added test test_raw_parameters_with_complex_types demonstrating all value types including nested dicts and mixed lists.

Copilot avatar Dec 14 '25 14:12 Copilot