feat: support context json schema dump
Description
Close #1294. Add JSON Schema dump feature.
Current blocked by https://github.com/crate-ci/git-conventional/pull/88.
Motivation and Context
Add a gloabl opt --dump-context-schema to dump current version Context JSON Schema.
How Has This Been Tested?
Screenshots / Logs (if applicable)
Types of Changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation (no code change)
- [ ] Refactor (refactoring production code)
- [ ] Other
Checklist:
Thanks for opening this pull request! Please check out our contributing guidelines! ⛰️
Codecov Report
:x: Patch coverage is 47.91667% with 25 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 43.03%. Comparing base (85cc05d) to head (572e139).
Additional details and impacted files
@@ Coverage Diff @@
## main #1296 +/- ##
==========================================
- Coverage 43.46% 43.03% -0.43%
==========================================
Files 22 22
Lines 1972 1992 +20
==========================================
Hits 857 857
- Misses 1115 1135 +20
| Flag | Coverage Δ | |
|---|---|---|
| unit-tests | 43.03% <47.92%> (-0.43%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
I have done the fmt and clippy check.
Regarding the Schema → SDK issue: I agree this workflow is not ideal in most cases. While Cliff does not define a stable Schema version, every change to Context can break existing interfaces—and when using Python or TS there is no type checking at runtime. Therefore, if we use a Schema, we must pin the current Cliff version.
Regarding Schema generation, after careful consideration, I approved for defining a separate remote definition for git-conventional. However, the Release type inside git-cliff-core is quite complex; defining it in a third-party library would require significant effort to adapt after each release.
This PR is just a draft idea for now. Maybe a better a approach: We could gate this with a feature flag and, via some process (scripts or an xtask, etc.), generate a Schema when Cliff publishes a release and upload it to the Releases artifacts. Would that be more appropriate?
cc: @orhun
Thanks for clarifying. I understand this PR is still a draft. However, it seems that some tests are failing. Have you run the tests locally to confirm?
Also, I’m a bit hesitant about this feature itself, as well as adding a related option to the CLI. We should consider whether this functionality is truly needed for most users. Even if it is, it might be cleaner, from a separation-of-concerns perspective, to add a separate binary under git-cliff-core rather than integrating it into the main CLI.
I agree this workflow is not ideal in most cases.
As you also noted, the current implementation direction seems problematic to me as well.
defining it in a third-party library would require significant effort to adapt after each release. generate a Schema when Cliff publishes a release and upload it to the Releases artifacts.
In other words, what's being proposed here effectively shifts the maintenance burden to the maintainers, before we've even established a clear need for the feature. I think we should have a more thorough discussion about the necessity and maintenance cost of this functionality before proceeding further.
Personally, I would still recommend creating a separate repository dedicated to schema generation, which could depend on both git-cliff-core and git-conventional.
Hey, sorry for the delay on this. I'll be having a look soon hopefully
Although I like the changes in this PR, I agree that the output of this CLI command would be too specific to the current release of git-cliff. In other words, it's too dynamic and it might limit our capability to update the context without worrying about user setups that depend on this. We could maybe use a version key, but I believe it will be incremented very quickly if we add/remove/update fields.
I'm not sure how a separate repo would look like for this. Are there any example projects that do that we could get inspired from?
Thoughts?
I don't know of any open-source projects that apply an implementation or operational approach like the one presented in this PR, but I believe there are internal tools that use JSON Schema to validate data type consistency.
I could try generating the schema using build.rs in a designated repository with the method I mentioned earlier. In particular, since the data models in git-cliff-core are expected to vary depending on feature flags (even though all backend feature flags are enabled by default in git-cliff), I think it would make more sense to manage the JSON Schemas in a separate dedicated repository, rather than deriving JsonSchema directly within git-cliff-core.
Yeah, I think that's a fair approach. Thanks for looking into this!
@greenhat616
By the way, I've re-read the related issue:
https://github.com/orhun/git-cliff/issues/1294
and I’m wondering — is git-cliff actually being used in the development cycle described there?
From my understanding, it doesn't seem like the issue's context involves using git-cliff directly. Could you clarify if I'm missing something?
@ognis1205
By the way, I've re-read the related issue:
https://github.com/orhun/git-cliff/issues/1294
and I’m wondering — is git-cliff actually being used in the development cycle described there?
From my understanding, it doesn't seem like the issue's context involves using git-cliff directly. Could you clarify if I'm missing something?
The limited for git-cliff is not able to run scripts (lua, js etc), and query third-part API to query more detailed information to generate changelog. Compared with introducing a scripting engine for a custom preprocessing stage, adding type info to the existing Context—so third-party tools can handle it more easily—would require fewer changes, wouldn’t it?
For example, if you’re using third-party planning/project-management tools like Linear or Feishu and need to map your journal content (preferably only PRs) to task id and titles in those tools, then using Context with scripting for custom processing is a great fit.
Sry, the generator is internal tool, I cannot share it public.
I don't know of any open-source projects that apply an implementation or operational approach like the one presented in this PR, but I believe there are internal tools that use JSON Schema to validate data type consistency.
I could try generating the schema using
build.rsin a designated repository with the method I mentioned earlier. In particular, since the data models ingit-cliff-coreare expected to vary depending on feature flags (even though all backend feature flags are enabled by default ingit-cliff), I think it would make more sense to manage the JSON Schemas in a separate dedicated repository, rather than derivingJsonSchemadirectly withingit-cliff-core.
Yes, I agree with this approach as well. This PR is mainly a prototype for the workflow described above. Publishing it in a separate repository, or bundling it with a Release artifact, would both be good options. External users could use this file to directly generate the Context for the current version. The schema metadata could be provided as a cliff-core feature rather than as a command flag in the cli.
Although I like the changes in this PR, I agree that the output of this CLI command would be too specific to the current release of
git-cliff. In other words, it's too dynamic and it might limit our capability to update the context without worrying about user setups that depend on this. We could maybe use aversionkey, but I believe it will be incremented very quickly if we add/remove/update fields.I'm not sure how a separate repo would look like for this. Are there any example projects that do that we could get inspired from?
Thoughts?
@orhun
Yes. Cliff is an actively evolving project, and the Context may change quite frequently. There’s no plan to provide stable type guarantees, and introducing a VERSION field to mark versions would be cumbersome. So treating this as something (a) exposed only behind a feature flag, (b) published alongside each release, or (c) maintained as a per-version schema in the repository are all acceptable trade-offs.
Asking users to define their own interfaces to adapt to the Context is too cumbersome and hard to validate—if the type definitions change substantially, it’s difficult for type checkers like tsc or pyright to automatically surface those changes. That would make upgrading cliff versions more painful.
Thanks for the clarification @greenhat616 .
I still haven't fully grasped in your use case which specific field(s) of Context you intend to use and how you’d like to add Linear-related information (such as task IDs or titles) to which part of the Context.
query third-part API to query more detailed information to generate changelog.
I'm not familiar with Linear myself, but according to their docs, if you include the Linear issue ID in the PR title, Linear automatically links the PR to the issue. If that's the case, wouldn't it be possible to use this feature:
https://github.com/orhun/git-cliff/pull/1287
in git-cliff to convert issue IDs in PR titles into links to Linear issues during changelog generation?
Also, regarding schema validation: typically, schema or type validation is used at untrusted data boundaries. I'm not sure if git-cliff really falls into that category. To me, the motivation for providing a JSON Schema seems more like "avoiding the need to handcraft data models," rather than addressing data integrity or validation concerns.
If only certain fields of the context are problematic, it might be simpler to just treat the context as plain JSON and access or update the relevant fields via their paths — without deserializing the entire structure into a strict data model. After all, git-cliff doesn't seem to operate in an untrusted environment.
That said, it might be a good idea to first make sure this PR passes the tests and that the JSON Schema is properly generated. At the moment, it's hard to verify anything without that.