icu4x
icu4x copied to clipboard
Decide on longer term plan for ICU4X web site
The build artifacts were cleaned up in #3054, but we still don't have a good plan for the landing page and how everything is tied together.
Discuss with:
- @echeran
- @sffc
Just for completeness: the reason why I consider this separate from #3054 is that it can be solved entirely in the icu4x-docs repo, we can go full web-design there and it's independent of icu4x development itself. If we want to link to docs, we already decided that linking to the latest published version on docs.rs is preferable over the main artifacts of this repo.
I think #2929 should also be a part of this.
Discussion:
- Landing pages for all Unicode projects are not in a great state, but let's scope this to just the ICU4X project
- Problem with publishing build artifacts (like tip-of-main docs) is that they could get indexed, but we want people to go to docs.rs in general
- The Rust Book template is a possibility but it sort-of takes over the whole web site; it wants to be your primary theme. We don't necesarilly need that
- @echeran will sketch out a plan.
Plan for work, as discussed, is in https://github.com/unicode-org/icu4x/pull/4357#issuecomment-1839128635
TC decision: we want this for 2.0 publicity but it doesn't need to block the 2.0 code release
Summary of deep dive between @sffc and @echeran:
- Some content will be unversioned (such as the home page), and other content will be versioned (such as tutorials and FFI docs).
- @sffc wants the versioned content to be reproducibly generated by a script.
- The script depends on multiple moving targets, including the ICU4X version, the Astro version, and the "web site template" version.
- It is a lot of work to maintain separate scripts that work with these moving targets.
- Therefore, we maintain only the following scripts:
- Script that generates from the latest ICU4X to the latest Astro and template.
- Script that upgrades versioned content whenever Astro or the template changes in a way that requires upgrading versioned content. (This might not happen right away, but we have a path when we want it.)
- The script should be written in a General-Purpose Programming Language (GPPL), because it is more complex than what would be suitable for a Command-Automation Language (like Bash or Duckscript), but the moving targets mean that the script needs to be flexible, so a System Programming Language is not the most appropriate.
- The best GPPL seems to be JavaScript or TypeScript, because (1) it is familiar to the team, (2) it is a Web-related language for a Web site, and (3) Astro is already JS and involve npm.
- @echeran Review of previous conclusion. Explanation of automating copying: "unversioned" content lives in
icu4x-docsand be updated in PRs there. "versioned" content (tutorials, etc.) lives in upstreamicu4xand gets copied over by a script at release time into a new subfolder in the website. - @robertbastian We still need a human to review the website to ensure that things look correctly, verify that links all work, etc. We probably need a staging environment to validate the website before release
- @echeran Yes. But a script to automate the copying of the "versioned content" will help with the reproducability of that step. The hard part or time consuming part is not writing such a script, but it is the base setup of the static site generator tool and configurations that has to come first.
- @Manishearth I would like the ICU4X tutorials as close to the website as possible. The automated copying is doable and should be done.
- @echeran We will have a separate PR for unversioned changes. And a PR for each release's new versioned content additions. We wouldn't merge the PR for the versioned content unless and until we have a human review.
- @sffc I think the extra staging branch is not necessary. I'm fine with a 98% SLA; we don't need 99.9%. Fixing bugs on the live website is fine. The staging branch doesn't completely resolve bugs; things like relative paths or CORS headers or something could be broken. But it's fine; I'm not opposed, I just think it's extra overhead we don't need.
Proposal:
- We have a website repo with versioned and unversioned content living under
main. main - Unversioned content is added by making PRs to the repo
- Versioned content is added during the release process of ICU4X by calling some script
./get_versioned_content.sh 2.0.0-beta2 <git ref>. This goes in a PR that can be merged - Main can be autopublished or have some staging situation
Script that generates from the latest ICU4X to the latest Astro and template.
We'll also need a way to handle patch releases. So I'd say something that works with latest astro and tagged ICU4X, perhaps?
Aside from that that plan seems fine to me.
https://github.com/unicode-org/icu4x/issues/3424
https://github.com/unicode-org/icu4x/pull/4357#issuecomment-1839128635
Summary of what's agreed so far:
-
The website will live in the
icu4x-docsrepo -
icu4x-docscontains all the pretty website stuff and can be updated at any time -
icu4x-docsalready has the icu4x.unicode.org domain configured- At the moment, the site at
icu4x-docsthen redirects to the latest dev rustdoc
- At the moment, the site at
-
icu4x-docscontent & versioningicu4x-docswill contain both content that is unversioned (such as the home page), and other content that is versioned (such as tutorials and FFI docs).- Versioned content will be organized into subfolders named for each release version
- What is versioned may be determined later, we know we want FFI docs to be versioned. Tutorials TBD. We will not host rustdoc at all.
- We will be very careful about
noindexfor versioned subfolders.- Old releases should not get indexed
- The latest release should get indexed under
/latest, or should get indexed under/1.<latest>in a way that is not "sticky" in search engines - Ideally we do not have to duplicate a
/latestand a/1.<latest>.- This would need a redirect, we may wish to do research on how often
noindexgets reconsumed by crawlers when it comes to redirects. - Or we do what docs.rs does; which is that
/latestis indexed, and it has a permalink button to1.<latest>
- This would need a redirect, we may wish to do research on how often
-
Maintaining versioned content
- @sffc wants the versioned content to be reproducibly generated by a script.
- The script depends on multiple moving targets, including the ICU4X version, the Astro version, and the "web site template" version.
- It is a lot of work to maintain separate scripts that work with these moving targets.
- Therefore, we maintain only the following scripts:
- Script that generates from the latest ICU4X to the latest Astro and template.
- Script that upgrades versioned content whenever Astro or the template changes in a way that requires upgrading versioned content. (This might not happen right away, but we have a path when we want it.)
- The script should be written in a General-Purpose Programming Language (GPPL), because it is more complex than what would be suitable for a Command-Automation Language (like Bash or Duckscript), but the moving targets mean that the script needs to be flexible, so a System Programming Language is not the most appropriate.
- The best GPPL seems to be JavaScript or TypeScript, because (1) it is familiar to the team, (2) it is a Web-related language for a Web site, and (3) Astro is already JS and involve npm.
-
We may rename the repo to
icu4x-websiteor something -
Benchmark data files will be copied in every release. TBD how and whether to display them as HTML files; there may be some worth in having a "releases only" benchmark page for the external website (rather than "one data point per commit", which is more internally useful)
-
the repo's dual tutorial test (against head, crates.io) will be changed as follows
- switch the tutorials-local test to be the only CI tutorials CI test in the
icu4xrepo (deleting the tutorials-cratesio test) - The existing tutorials code gets versioned like the artifacts; for example, at
/1.4/cratesin theicu4x-docsrepo icu4x-docsgets a CI job that performs the equivalent coverage as the two existing CI jobs: one that tests tutorials against crates.io, and one that tests against icu4x component crates at themaintag. The job runs as a cron nightly CI.- We should consider a better semver test on the main repo but that is a totally separate discussion.
- switch the tutorials-local test to be the only CI tutorials CI test in the
-
keep unicode-org.github.io/icu4x but set it noindex to avoid duplicate google search results; don't expose links to GCP to anyone other than core ICU4X contributors
-
Updates
- We have a website repo with versioned and unversioned content living under main.
- Content is added by making PRs & merging PRs
- Version content is added by a PR in
icu4x-docsthat can be created by some script that runs inicu4x - Main can be autopublished or we have some staging situation
-
Questions for discussion:
- Can we incrementally decide on publishing from
mainand add the staging situation indirection later on?
- Can we incrementally decide on publishing from
Conclusion:
- We start creating a script based on the current version of ICU4X (ICU4X 2.0) and the current version of Astro.
- We update the script to handle version upgrades in ICU4X and Astro going forward.
- We would like to have ICU4X 1.5 content, ideally generated by a script.
- Example weighting: 5 points for 2.0 content, 3 points for 2.0 script, 2 points for 1.5 content, and 1 point for 1.5 script
LGTM: @sffc @Manishearth @echeran @robertbastian
@echeran suggests adding WASM demos in an iframe.
@nekevss suggested having a "data explorer".
Otherwise, my opinion is that I think what we have is good enough for 2.0 and we should keep continually improving it.
We have most of this implemented. I think there's still room to clean up the web site tooling but we can do that iteratively, including on the next ICU4X release. Issues should probably be tracked on the icu4x-docs repo moving forward.