Failed to generate search index for zh-Hans
Bug Report
Environment
Zola version: zola 0.20.0 (custom build with: cargo build --release --features indexing-ja --features indexing-zh)
Expected Behavior
Success to generate zh-Hans search index.
Current Behavior
It only works for Japanese, but not for zh-Hans. I believe it may be a bug for zola.
$ ls public/ | grep search_
search_index.en.json
search_index.ja.json
Build for zh-Hans
$ RUST_BACKTRACE=1 /Users/apple/zola/target/release/zola serve
Building site...
Checking all internal links with anchors.
> Successfully checked 0 internal link(s) with anchors.
-> Creating 23 pages (0 orphan) and 11 sections
Error: Failed to serve the site
Error: Tried to build search index for language zh-Hans which is not supported
Step to reproduce
- enable zh-Hans language and its corresponding search index flag
- build zola with ja and zh index feature with:
cargo build --release --features indexing-ja --features indexing-zh
$ cargo build --release --features indexing-ja --features indexing-zh
Finished `release` profile [optimized] target(s) in 0.50s
warning: the following packages contain code that will be rejected by a future version of Rust: quick-xml v0.17.2
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1`
$ du -sh ./target/release/zola
110M ./target/release/zola
- run custom built
zola serve, and get errors.
Reference
- commit for building:
379f6c1f622ffeb2793c11301efbcf6f49b97138 - theme I am using: tabi
I think the library expects only zh as lang
Since I didn't find any specs for language code on this post, I assume Zola uses the same language code with tabi theme.
Currently tabi use ISO 639-1 Language Code List, which use zh-Hans.
I suspect that the library only takes zh instead of zh-Hans or zh-Hant as well. However, tabi fails to show the correct translation to that language code. If it were a bug in tabi, I would create another issue there.
Language code for tabi could be found in here: https://github.com/welpo/tabi/tree/main/i18n
https://github.com/mattico/elasticlunr-rs/blob/4db7fac70fa4d6281bf527d9fae07f5a2169f252/src/lang/mod.rs#L85-L105
The upstream crate doesn't support.
A hack: create a custom i18n file named zh.toml under your root dir, then copy the content from the original one.
If u want zh-Hant as well, you may just disable search for it.
Update: continue from https://github.com/welpo/tabi/issues/519
目前 zola 和 tabi 的多语言支持还是蛮糟糕的. 对于中文或日语用户, 下面是你可能遇到的一些问题:
Currently zola's multi language support is not so good. For Chinese or Japanese users, you may comes with the following problem:
-
使用默认官方模板, 按照官方多语言指南,
default_language设置为zh-Hans, 生成内容失败. 不能开箱即用.Using the official template and following the official multilingual guide, setting
default_languagetozh-Hanswill result in failure building contents. Cannot be used out of the box.因为 zola 默认情况下没有中文或日语支持.
This is because Zola does not support Chinese or Japanese by default.
解决方案 (官方文档其实写了): 需要自行编译安装 zola, 添加分词支持:
cargo install --git https://github.com/getzola/zola.git --features indexing-zh,indexing-jp zola.Solution: You need to compile and install Zola manually:
cargo install --git https://github.com/getzola/zola.git --features indexing-zh,indexing-jp zola.
但很遗憾, 对于中文用户(含简体或繁体), 上面的做法仍然是不够的的, 原因是 zola 使用到的分词库并不接受 zh-Hans 等的写法:
For Chinese users (including Simplified or Traditional one), the above approach remains ineffective because the upstream crate used by Zola does not accept notations like zh-Hans:
https://github.com/mattico/elasticlunr-rs/blob/4db7fac70fa4d6281bf527d9fae07f5a2169f252/src/lang/mod.rs#L85-L105
impl_language! {
(English, en),
(Arabic, ar, #[cfg(feature = "ar")]),
(Chinese, zh, #[cfg(feature = "zh")]),
// ...
}
为了搜索分词, 参考评论 https://github.com/getzola/zola/issues/2800#issuecomment-2817945467, 但是并不尽善尽美, 目前还发现了一个严重问题:
For search Chinese, you may try the temporal solution I shared here https://github.com/getzola/zola/issues/2800#issuecomment-2817945467, but it is not yet perfect. Currently, the following issues remain:
-
giscus 评论组件失效
The Giscus comment doesn't work.
原因是 giscus 的服务器不接受
zh的写法 (关键 APIhttps://giscus.app/{lang}/widget当 lang 是zh时会返回 404).The reason is that Giscus's server does not accept the notation
zh(the key APIhttps://giscus.app/{lang}/widgetreturns 404).临时解决方案是设定 lang 为
zh-CN等 giscus 官方支持的值, 不跟随页面语言.
I confirm that giscus accepts lang code like zh-Hans or zh-Hant, the core problem is that zola doesn't accept that. For crate elasticlunr-rs, Chinese support is done by jieba-rs and it actually only supports zh-Hans but not zh-Hant (see https://github.com/messense/jieba-rs/issues/112).
The solution may be:
-
Modify
zola, when handlingzh-Hanslang code, usezhinstead, see here:https://github.com/getzola/zola/blob/459d95acd418fd94f8c25e3aa984b8e7c93428c9/components/search/src/elasticlunr.rs#L83-L86
-
Modify
elasticlunr-rs, correct the ISO lang code.