icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Decide on longer term plan for ICU4X web site

Open sffc opened this issue 2 years ago • 9 comments
trafficstars

The build artifacts were cleaned up in #3054, but we still don't have a good plan for the landing page and how everything is tied together.

Discuss with:

  • @echeran
  • @sffc

sffc avatar May 11 '23 18:05 sffc

Just for completeness: the reason why I consider this separate from #3054 is that it can be solved entirely in the icu4x-docs repo, we can go full web-design there and it's independent of icu4x development itself. If we want to link to docs, we already decided that linking to the latest published version on docs.rs is preferable over the main artifacts of this repo.

I think #2929 should also be a part of this.

robertbastian avatar May 12 '23 12:05 robertbastian

Discussion:

  • Landing pages for all Unicode projects are not in a great state, but let's scope this to just the ICU4X project
  • Problem with publishing build artifacts (like tip-of-main docs) is that they could get indexed, but we want people to go to docs.rs in general
  • The Rust Book template is a possibility but it sort-of takes over the whole web site; it wants to be your primary theme. We don't necesarilly need that
  • @echeran will sketch out a plan.

sffc avatar Jun 15 '23 19:06 sffc

Plan for work, as discussed, is in https://github.com/unicode-org/icu4x/pull/4357#issuecomment-1839128635

echeran avatar Feb 29 '24 19:02 echeran

TC decision: we want this for 2.0 publicity but it doesn't need to block the 2.0 code release

sffc avatar Sep 17 '24 17:09 sffc

Summary of deep dive between @sffc and @echeran:

  • Some content will be unversioned (such as the home page), and other content will be versioned (such as tutorials and FFI docs).
  • @sffc wants the versioned content to be reproducibly generated by a script.
  • The script depends on multiple moving targets, including the ICU4X version, the Astro version, and the "web site template" version.
  • It is a lot of work to maintain separate scripts that work with these moving targets.
  • Therefore, we maintain only the following scripts:
    • Script that generates from the latest ICU4X to the latest Astro and template.
    • Script that upgrades versioned content whenever Astro or the template changes in a way that requires upgrading versioned content. (This might not happen right away, but we have a path when we want it.)
  • The script should be written in a General-Purpose Programming Language (GPPL), because it is more complex than what would be suitable for a Command-Automation Language (like Bash or Duckscript), but the moving targets mean that the script needs to be flexible, so a System Programming Language is not the most appropriate.
  • The best GPPL seems to be JavaScript or TypeScript, because (1) it is familiar to the team, (2) it is a Web-related language for a Web site, and (3) Astro is already JS and involve npm.

sffc avatar Feb 11 '25 11:02 sffc

  • @echeran Review of previous conclusion. Explanation of automating copying: "unversioned" content lives in icu4x-docs and be updated in PRs there. "versioned" content (tutorials, etc.) lives in upstream icu4x and gets copied over by a script at release time into a new subfolder in the website.
  • @robertbastian We still need a human to review the website to ensure that things look correctly, verify that links all work, etc. We probably need a staging environment to validate the website before release
  • @echeran Yes. But a script to automate the copying of the "versioned content" will help with the reproducability of that step. The hard part or time consuming part is not writing such a script, but it is the base setup of the static site generator tool and configurations that has to come first.
  • @Manishearth I would like the ICU4X tutorials as close to the website as possible. The automated copying is doable and should be done.
  • @echeran We will have a separate PR for unversioned changes. And a PR for each release's new versioned content additions. We wouldn't merge the PR for the versioned content unless and until we have a human review.
  • @sffc I think the extra staging branch is not necessary. I'm fine with a 98% SLA; we don't need 99.9%. Fixing bugs on the live website is fine. The staging branch doesn't completely resolve bugs; things like relative paths or CORS headers or something could be broken. But it's fine; I'm not opposed, I just think it's extra overhead we don't need.

Proposal:

  • We have a website repo with versioned and unversioned content living under main. main
  • Unversioned content is added by making PRs to the repo
  • Versioned content is added during the release process of ICU4X by calling some script ./get_versioned_content.sh 2.0.0-beta2 <git ref>. This goes in a PR that can be merged
  • Main can be autopublished or have some staging situation

Manishearth avatar Feb 11 '25 12:02 Manishearth

Script that generates from the latest ICU4X to the latest Astro and template.

We'll also need a way to handle patch releases. So I'd say something that works with latest astro and tagged ICU4X, perhaps?

Aside from that that plan seems fine to me.

Manishearth avatar Feb 11 '25 13:02 Manishearth

https://github.com/unicode-org/icu4x/issues/3424

https://github.com/unicode-org/icu4x/pull/4357#issuecomment-1839128635

Summary of what's agreed so far:

  • The website will live in the icu4x-docs repo

  • icu4x-docs contains all the pretty website stuff and can be updated at any time

  • icu4x-docs already has the icu4x.unicode.org domain configured

    • At the moment, the site at icu4x-docs then redirects to the latest dev rustdoc
  • icu4x-docs content & versioning

    • icu4x-docs will contain both content that is unversioned (such as the home page), and other content that is versioned (such as tutorials and FFI docs).
    • Versioned content will be organized into subfolders named for each release version
    • What is versioned may be determined later, we know we want FFI docs to be versioned. Tutorials TBD. We will not host rustdoc at all.
    • We will be very careful about noindex for versioned subfolders.
      • Old releases should not get indexed
      • The latest release should get indexed under /latest, or should get indexed under /1.<latest> in a way that is not "sticky" in search engines
      • Ideally we do not have to duplicate a /latest and a /1.<latest>.
        • This would need a redirect, we may wish to do research on how often noindex gets reconsumed by crawlers when it comes to redirects.
        • Or we do what docs.rs does; which is that /latest is indexed, and it has a permalink button to 1.<latest>
  • Maintaining versioned content

    • @sffc wants the versioned content to be reproducibly generated by a script.
    • The script depends on multiple moving targets, including the ICU4X version, the Astro version, and the "web site template" version.
    • It is a lot of work to maintain separate scripts that work with these moving targets.
    • Therefore, we maintain only the following scripts:
      • Script that generates from the latest ICU4X to the latest Astro and template.
      • Script that upgrades versioned content whenever Astro or the template changes in a way that requires upgrading versioned content. (This might not happen right away, but we have a path when we want it.)
    • The script should be written in a General-Purpose Programming Language (GPPL), because it is more complex than what would be suitable for a Command-Automation Language (like Bash or Duckscript), but the moving targets mean that the script needs to be flexible, so a System Programming Language is not the most appropriate.
    • The best GPPL seems to be JavaScript or TypeScript, because (1) it is familiar to the team, (2) it is a Web-related language for a Web site, and (3) Astro is already JS and involve npm.
  • We may rename the repo to icu4x-website or something

  • Benchmark data files will be copied in every release. TBD how and whether to display them as HTML files; there may be some worth in having a "releases only" benchmark page for the external website (rather than "one data point per commit", which is more internally useful)

  • the repo's dual tutorial test (against head, crates.io) will be changed as follows

    • switch the tutorials-local test to be the only CI tutorials CI test in the icu4x repo (deleting the tutorials-cratesio test)
    • The existing tutorials code gets versioned like the artifacts; for example, at /1.4/crates in the icu4x-docs repo
    • icu4x-docs gets a CI job that performs the equivalent coverage as the two existing CI jobs: one that tests tutorials against crates.io, and one that tests against icu4x component crates at the main tag. The job runs as a cron nightly CI.
    • We should consider a better semver test on the main repo but that is a totally separate discussion.
  • keep unicode-org.github.io/icu4x but set it noindex to avoid duplicate google search results; don't expose links to GCP to anyone other than core ICU4X contributors

  • Updates

    • We have a website repo with versioned and unversioned content living under main.
    • Content is added by making PRs & merging PRs
    • Version content is added by a PR in icu4x-docs that can be created by some script that runs in icu4x
    • Main can be autopublished or we have some staging situation
  • Questions for discussion:

    • Can we incrementally decide on publishing from main and add the staging situation indirection later on?

Conclusion:

  • We start creating a script based on the current version of ICU4X (ICU4X 2.0) and the current version of Astro.
  • We update the script to handle version upgrades in ICU4X and Astro going forward.
  • We would like to have ICU4X 1.5 content, ideally generated by a script.
    • Example weighting: 5 points for 2.0 content, 3 points for 2.0 script, 2 points for 1.5 content, and 1 point for 1.5 script

LGTM: @sffc @Manishearth @echeran @robertbastian

Manishearth avatar Feb 14 '25 09:02 Manishearth

@echeran suggests adding WASM demos in an iframe.

@nekevss suggested having a "data explorer".

Otherwise, my opinion is that I think what we have is good enough for 2.0 and we should keep continually improving it.

sffc avatar May 01 '25 18:05 sffc

We have most of this implemented. I think there's still room to clean up the web site tooling but we can do that iteratively, including on the next ICU4X release. Issues should probably be tracked on the icu4x-docs repo moving forward.

sffc avatar Aug 22 '25 22:08 sffc