Add tool reporting outdated l10n documents by lastmod difference
This PR adds tool (report-outdated-by-mod.py) reporting outdated l10n documents by Lastmod difference.
- Ref: #42441
- "outdated content" warning on localization pages introduced in https://github.com/kubernetes/website/pull/41768. This automation adds a warning message to localization pages when the English version of the page has been updated more recently than the localized page. This is determined by comparing the Lastmod of a page in English and a given Localization.
This script compares markdown files across different language directories to identify and report localized documents that may be outdated, based on modification date differences.
It focuses primarily on:
- Reporting outdated documents based on modification date differences.
- Estimating false alerts.
- Calculating the similarity between the English version and localized versions of documents. (similarity analysis includes line counts, special character patterns, and English word usage patterns.)
The output in table style will be useful to maintaining localized documents and also checking overall status of all languages.
How to use
$ python ./scripts/report-outdated-by-mod.py --help
Usage: report-outdated-by-mod.py [-h] [--path PATH] [target_lang ...]
Users can specify target languages for comparison against the English base.
If no languages are specified, all directories will be compared.
The path to the content directory can be specified using the --path parameter;
if not provided, './content' or '../content' is used as the default.
positional arguments:
target_lang Target language directories (e.g., ko ja fr). If empty, all directories will be compared.
options:
-h, --help show this help message and exit
--path PATH Base content directory. Default is './content'
Screenshots
- ./scripts/report-outdated-by-mod.py ko
- ./scripts/report-outdated-by-mod.py
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign natalisucks for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Pull request preview available for checking
Built without sensitive environment variables
| Name | Link |
|---|---|
| Latest commit | d60e15b7e5281a50ac0c8a7da2d818ae23066303 |
| Latest deploy log | https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/661d67a93e5d0200087a1b29 |
| Deploy Preview | https://deploy-preview-45844--kubernetes-io-main-staging.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
/area localization
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle rotten - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Hi @divya-mohan0209 @reylejano @natalisucks I think this PR is ready for approval. I believe this script is useful for localization teams as is, and the tool can be further enhanced if needed.
Since we have a much simpler script for this, i.e. scripts/lsync.sh, why bother adding a new tool which does almost the same thing?
If the new tool provides a benefit to a localization team, I think it's welcome, because we support localization teams to pick a workflow that works for them.
It's also OK to combine the lsync.sh and report-outdated-by-mod.py tools; that would need buy in from all the localization teams that rely on either tool.
Hi @tengqm @sftim
I understand that lsync.sh is a simple tool that is already being used effectively by specific localization teams to track differences between documents. However, I believe the script tool introduced in this PR has a somewhat different purpose, as described in the PR content.
- Reporting outdated documents based on modification date differences. Estimating false alerts. The output in table format will be useful for maintaining localized documents and checking the overall status of all languages.
- Calculating the similarity between the English version and localized versions of documents. (The similarity analysis includes line counts, special character patterns, and English word usage patterns.)
Although it is possible to merge it with an existing script like lsync.sh, I think merging might not bring significant benefits to contributors who are already using the simple lsync.sh effectively for their purposes. In fact, it could introduce unnecessary inconvenience. Instead, I suggest treating the script introduced in this PR as a Proof of Concept and encouraging people to try it out and improve if necessary.
I like the idea of this, but I'm not in any localization team.
Also see https://github.com/kubernetes/website/pull/48163
/remove-lifecycle rotten
@seokho-son I've not LGTMed or approved this because:
- I don't do localization work enough to check whether this script is useful
- (AIUI) I shouldn't have access to approve this change
I recommend asking localization teams to try it out and comment.
@seokho-son I've not LGTMed or approved this because:
* I don't do localization work enough to check whether this script is useful * (AIUI) I shouldn't have access to approve this changeI recommend asking localization teams to try it out and comment.
I'd like to second this—the review bot has asked me to review this PR. While I like the idea of this tool, I'm not on any localization team so my opinion of the tool isn't so useful.
/uncc @nate-double-u
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@seokho-son per https://github.com/kubernetes/website/pull/45844#issuecomment-2388105427
- can you find two different localization teams where at least one member of each time finds the tool useful?
- do you have any comment on the feedback thus far?
/remove-lifecycle stale
@seokho-son the AI did find a nit.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle rotten - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
@seokho-son Do we intend to continue work on this PR?
Hi @divya-mohan0209 , Thanks for letting me know. I hadn’t been paying attention.
The script proposed in this PR was created for managing localization documents and supporting various purposes. I believe it could be directly helpful, at least for some localization teams.
However, I think the way the tool currently processes things needs improvement. If we compare a localized document against their English versions solely based on the lastmod value, it might result in false positive alerts (e.g., changes in English docs that don’t actually impact the localized versions), or consistently miss outdated content (e.g., when there are significant differences in the English version but the localized version was simply updated recently, making it undetectable by lastmod alone).
With this in mind, I’ll work on further improving the tool, get it reviewed, and go through proper discussion — hopefully without too much delay.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Reopen this PR with
/reopen - Mark this PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closed this PR.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closedYou can:
- Reopen this PR with
/reopen- Mark this PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.