git-cliff icon indicating copy to clipboard operation
git-cliff copied to clipboard

feat: support context json schema dump

Open greenhat616 opened this issue 1 month ago • 11 comments

Description

Close #1294. Add JSON Schema dump feature.

Current blocked by https://github.com/crate-ci/git-conventional/pull/88.

Motivation and Context

Add a gloabl opt --dump-context-schema to dump current version Context JSON Schema.

How Has This Been Tested?

Screenshots / Logs (if applicable)

Types of Changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation (no code change)
  • [ ] Refactor (refactoring production code)
  • [ ] Other

Checklist:

  • [x] My code follows the code style of this project.
  • [x] I have updated the documentation accordingly.
  • [x] I have formatted the code with rustfmt.
  • [x] I checked the lints with clippy.
  • [x] I have added tests to cover my changes.
  • [x] All new and existing tests passed.

greenhat616 avatar Oct 27 '25 03:10 greenhat616

Thanks for opening this pull request! Please check out our contributing guidelines! ⛰️

welcome[bot] avatar Oct 27 '25 03:10 welcome[bot]

Codecov Report

:x: Patch coverage is 47.91667% with 25 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 43.03%. Comparing base (85cc05d) to head (572e139).

Files with missing lines Patch % Lines
git-cliff-core/src/changelog.rs 60.00% 12 Missing :warning:
git-cliff/src/lib.rs 0.00% 9 Missing :warning:
git-cliff/src/main.rs 0.00% 3 Missing :warning:
git-cliff-core/src/remote/mod.rs 75.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1296      +/-   ##
==========================================
- Coverage   43.46%   43.03%   -0.43%     
==========================================
  Files          22       22              
  Lines        1972     1992      +20     
==========================================
  Hits          857      857              
- Misses       1115     1135      +20     
Flag Coverage Δ
unit-tests 43.03% <47.92%> (-0.43%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Oct 27 '25 03:10 codecov-commenter

I have done the fmt and clippy check.

Regarding the Schema → SDK issue: I agree this workflow is not ideal in most cases. While Cliff does not define a stable Schema version, every change to Context can break existing interfaces—and when using Python or TS there is no type checking at runtime. Therefore, if we use a Schema, we must pin the current Cliff version.

Regarding Schema generation, after careful consideration, I approved for defining a separate remote definition for git-conventional. However, the Release type inside git-cliff-core is quite complex; defining it in a third-party library would require significant effort to adapt after each release.

This PR is just a draft idea for now. Maybe a better a approach: We could gate this with a feature flag and, via some process (scripts or an xtask, etc.), generate a Schema when Cliff publishes a release and upload it to the Releases artifacts. Would that be more appropriate?

greenhat616 avatar Oct 29 '25 12:10 greenhat616

cc: @orhun

Thanks for clarifying. I understand this PR is still a draft. However, it seems that some tests are failing. Have you run the tests locally to confirm?

Also, I’m a bit hesitant about this feature itself, as well as adding a related option to the CLI. We should consider whether this functionality is truly needed for most users. Even if it is, it might be cleaner, from a separation-of-concerns perspective, to add a separate binary under git-cliff-core rather than integrating it into the main CLI.

I agree this workflow is not ideal in most cases.

As you also noted, the current implementation direction seems problematic to me as well.

defining it in a third-party library would require significant effort to adapt after each release. generate a Schema when Cliff publishes a release and upload it to the Releases artifacts.

In other words, what's being proposed here effectively shifts the maintenance burden to the maintainers, before we've even established a clear need for the feature. I think we should have a more thorough discussion about the necessity and maintenance cost of this functionality before proceeding further.

Personally, I would still recommend creating a separate repository dedicated to schema generation, which could depend on both git-cliff-core and git-conventional.

ognis1205 avatar Oct 29 '25 15:10 ognis1205

Hey, sorry for the delay on this. I'll be having a look soon hopefully

orhun avatar Oct 29 '25 18:10 orhun

Although I like the changes in this PR, I agree that the output of this CLI command would be too specific to the current release of git-cliff. In other words, it's too dynamic and it might limit our capability to update the context without worrying about user setups that depend on this. We could maybe use a version key, but I believe it will be incremented very quickly if we add/remove/update fields.

I'm not sure how a separate repo would look like for this. Are there any example projects that do that we could get inspired from?

Thoughts?

orhun avatar Nov 03 '25 21:11 orhun

I don't know of any open-source projects that apply an implementation or operational approach like the one presented in this PR, but I believe there are internal tools that use JSON Schema to validate data type consistency.

I could try generating the schema using build.rs in a designated repository with the method I mentioned earlier. In particular, since the data models in git-cliff-core are expected to vary depending on feature flags (even though all backend feature flags are enabled by default in git-cliff), I think it would make more sense to manage the JSON Schemas in a separate dedicated repository, rather than deriving JsonSchema directly within git-cliff-core.

ognis1205 avatar Nov 04 '25 15:11 ognis1205

Yeah, I think that's a fair approach. Thanks for looking into this!

orhun avatar Nov 04 '25 15:11 orhun

@greenhat616

By the way, I've re-read the related issue:

https://github.com/orhun/git-cliff/issues/1294

and I’m wondering — is git-cliff actually being used in the development cycle described there?

From my understanding, it doesn't seem like the issue's context involves using git-cliff directly. Could you clarify if I'm missing something?

ognis1205 avatar Nov 04 '25 17:11 ognis1205

@ognis1205

By the way, I've re-read the related issue:

https://github.com/orhun/git-cliff/issues/1294

and I’m wondering — is git-cliff actually being used in the development cycle described there?

From my understanding, it doesn't seem like the issue's context involves using git-cliff directly. Could you clarify if I'm missing something?

The limited for git-cliff is not able to run scripts (lua, js etc), and query third-part API to query more detailed information to generate changelog. Compared with introducing a scripting engine for a custom preprocessing stage, adding type info to the existing Context—so third-party tools can handle it more easily—would require fewer changes, wouldn’t it?

For example, if you’re using third-party planning/project-management tools like Linear or Feishu and need to map your journal content (preferably only PRs) to task id and titles in those tools, then using Context with scripting for custom processing is a great fit.

Sry, the generator is internal tool, I cannot share it public.

I don't know of any open-source projects that apply an implementation or operational approach like the one presented in this PR, but I believe there are internal tools that use JSON Schema to validate data type consistency.

I could try generating the schema using build.rs in a designated repository with the method I mentioned earlier. In particular, since the data models in git-cliff-core are expected to vary depending on feature flags (even though all backend feature flags are enabled by default in git-cliff), I think it would make more sense to manage the JSON Schemas in a separate dedicated repository, rather than deriving JsonSchema directly within git-cliff-core.

Yes, I agree with this approach as well. This PR is mainly a prototype for the workflow described above. Publishing it in a separate repository, or bundling it with a Release artifact, would both be good options. External users could use this file to directly generate the Context for the current version. The schema metadata could be provided as a cliff-core feature rather than as a command flag in the cli.

Although I like the changes in this PR, I agree that the output of this CLI command would be too specific to the current release of git-cliff. In other words, it's too dynamic and it might limit our capability to update the context without worrying about user setups that depend on this. We could maybe use a version key, but I believe it will be incremented very quickly if we add/remove/update fields.

I'm not sure how a separate repo would look like for this. Are there any example projects that do that we could get inspired from?

Thoughts?

@orhun

Yes. Cliff is an actively evolving project, and the Context may change quite frequently. There’s no plan to provide stable type guarantees, and introducing a VERSION field to mark versions would be cumbersome. So treating this as something (a) exposed only behind a feature flag, (b) published alongside each release, or (c) maintained as a per-version schema in the repository are all acceptable trade-offs.

Asking users to define their own interfaces to adapt to the Context is too cumbersome and hard to validate—if the type definitions change substantially, it’s difficult for type checkers like tsc or pyright to automatically surface those changes. That would make upgrading cliff versions more painful.

greenhat616 avatar Nov 05 '25 03:11 greenhat616

Thanks for the clarification @greenhat616 .

I still haven't fully grasped in your use case which specific field(s) of Context you intend to use and how you’d like to add Linear-related information (such as task IDs or titles) to which part of the Context.

query third-part API to query more detailed information to generate changelog.

I'm not familiar with Linear myself, but according to their docs, if you include the Linear issue ID in the PR title, Linear automatically links the PR to the issue. If that's the case, wouldn't it be possible to use this feature:

https://github.com/orhun/git-cliff/pull/1287

in git-cliff to convert issue IDs in PR titles into links to Linear issues during changelog generation?

Also, regarding schema validation: typically, schema or type validation is used at untrusted data boundaries. I'm not sure if git-cliff really falls into that category. To me, the motivation for providing a JSON Schema seems more like "avoiding the need to handcraft data models," rather than addressing data integrity or validation concerns.

If only certain fields of the context are problematic, it might be simpler to just treat the context as plain JSON and access or update the relevant fields via their paths — without deserializing the entire structure into a strict data model. After all, git-cliff doesn't seem to operate in an untrusted environment.

That said, it might be a good idea to first make sure this PR passes the tests and that the JSON Schema is properly generated. At the moment, it's hard to verify anything without that.

ognis1205 avatar Nov 05 '25 05:11 ognis1205