website icon indicating copy to clipboard operation
website copied to clipboard

Add tool reporting outdated l10n documents by lastmod difference

Open seokho-son opened this issue 1 year ago • 9 comments

This PR adds tool (report-outdated-by-mod.py) reporting outdated l10n documents by Lastmod difference.

  • Ref: #42441
  • "outdated content" warning on localization pages introduced in https://github.com/kubernetes/website/pull/41768. This automation adds a warning message to localization pages when the English version of the page has been updated more recently than the localized page. This is determined by comparing the Lastmod of a page in English and a given Localization.

This script compares markdown files across different language directories to identify and report localized documents that may be outdated, based on modification date differences.

It focuses primarily on:

  • Reporting outdated documents based on modification date differences.
  • Estimating false alerts.
  • Calculating the similarity between the English version and localized versions of documents. (similarity analysis includes line counts, special character patterns, and English word usage patterns.)

The output in table style will be useful to maintaining localized documents and also checking overall status of all languages.

How to use

$ python ./scripts/report-outdated-by-mod.py --help
Usage: report-outdated-by-mod.py [-h] [--path PATH] [target_lang ...]

    Users can specify target languages for comparison against the English base.
    If no languages are specified, all directories will be compared.

    The path to the content directory can be specified using the --path parameter; 
    if not provided, './content' or '../content' is used as the default.

positional arguments:
  target_lang  Target language directories (e.g., ko ja fr). If empty, all directories will be compared.

options:
  -h, --help   show this help message and exit
  --path PATH  Base content directory. Default is './content'

Screenshots

  • ./scripts/report-outdated-by-mod.py ko

image

image

  • ./scripts/report-outdated-by-mod.py

image

seokho-son avatar Apr 11 '24 19:04 seokho-son

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign natalisucks for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Apr 11 '24 19:04 k8s-ci-robot

Pull request preview available for checking

Built without sensitive environment variables

Name Link
Latest commit d60e15b7e5281a50ac0c8a7da2d818ae23066303
Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/661d67a93e5d0200087a1b29
Deploy Preview https://deploy-preview-45844--kubernetes-io-main-staging.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Apr 11 '24 19:04 netlify[bot]

/area localization

seokho-son avatar Apr 11 '24 19:04 seokho-son

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 14 '24 18:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 13 '24 18:08 k8s-triage-robot

Hi @divya-mohan0209 @reylejano @natalisucks I think this PR is ready for approval. I believe this script is useful for localization teams as is, and the tool can be further enhanced if needed.

seokho-son avatar Sep 11 '24 10:09 seokho-son

Since we have a much simpler script for this, i.e. scripts/lsync.sh, why bother adding a new tool which does almost the same thing?

tengqm avatar Sep 11 '24 11:09 tengqm

If the new tool provides a benefit to a localization team, I think it's welcome, because we support localization teams to pick a workflow that works for them.

It's also OK to combine the lsync.sh and report-outdated-by-mod.py tools; that would need buy in from all the localization teams that rely on either tool.

sftim avatar Sep 11 '24 12:09 sftim

Hi @tengqm @sftim

I understand that lsync.sh is a simple tool that is already being used effectively by specific localization teams to track differences between documents. However, I believe the script tool introduced in this PR has a somewhat different purpose, as described in the PR content.

  • Reporting outdated documents based on modification date differences. Estimating false alerts. The output in table format will be useful for maintaining localized documents and checking the overall status of all languages.
  • Calculating the similarity between the English version and localized versions of documents. (The similarity analysis includes line counts, special character patterns, and English word usage patterns.)

Although it is possible to merge it with an existing script like lsync.sh, I think merging might not bring significant benefits to contributors who are already using the simple lsync.sh effectively for their purposes. In fact, it could introduce unnecessary inconvenience. Instead, I suggest treating the script introduced in this PR as a Proof of Concept and encouraging people to try it out and improve if necessary.

seokho-son avatar Sep 20 '24 07:09 seokho-son

I like the idea of this, but I'm not in any localization team.

Also see https://github.com/kubernetes/website/pull/48163

/remove-lifecycle rotten

sftim avatar Oct 02 '24 09:10 sftim

@seokho-son I've not LGTMed or approved this because:

  • I don't do localization work enough to check whether this script is useful
  • (AIUI) I shouldn't have access to approve this change

I recommend asking localization teams to try it out and comment.

sftim avatar Oct 02 '24 09:10 sftim

@seokho-son I've not LGTMed or approved this because:

* I don't do localization work enough to check whether this script is useful

* (AIUI) I shouldn't have access to approve this change

I recommend asking localization teams to try it out and comment.

I'd like to second this—the review bot has asked me to review this PR. While I like the idea of this tool, I'm not on any localization team so my opinion of the tool isn't so useful.

/uncc @nate-double-u

nate-double-u avatar Oct 23 '24 23:10 nate-double-u

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 22 '25 00:01 k8s-triage-robot

@seokho-son per https://github.com/kubernetes/website/pull/45844#issuecomment-2388105427

  • can you find two different localization teams where at least one member of each time finds the tool useful?
  • do you have any comment on the feedback thus far?

sftim avatar Feb 07 '25 12:02 sftim

/remove-lifecycle stale

sftim avatar Feb 07 '25 12:02 sftim

@seokho-son the AI did find a nit.

sftim avatar Mar 14 '25 12:03 sftim

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 12 '25 12:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 12 '25 13:07 k8s-triage-robot

@seokho-son Do we intend to continue work on this PR?

divya-mohan0209 avatar Jul 15 '25 04:07 divya-mohan0209

Hi @divya-mohan0209 , Thanks for letting me know. I hadn’t been paying attention.

The script proposed in this PR was created for managing localization documents and supporting various purposes. I believe it could be directly helpful, at least for some localization teams.

However, I think the way the tool currently processes things needs improvement. If we compare a localized document against their English versions solely based on the lastmod value, it might result in false positive alerts (e.g., changes in English docs that don’t actually impact the localized versions), or consistently miss outdated content (e.g., when there are significant differences in the English version but the localized version was simply updated recently, making it undetectable by lastmod alone).

With this in mind, I’ll work on further improving the tool, get it reviewed, and go through proper discussion — hopefully without too much delay.

seokho-son avatar Jul 16 '25 13:07 seokho-son

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Aug 15 '25 13:08 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Aug 15 '25 13:08 k8s-ci-robot