zola icon indicating copy to clipboard operation
zola copied to clipboard

Failed to generate search index for zh-Hans

Open Gabirel opened this issue 1 year ago • 3 comments

Bug Report

Environment

Zola version: zola 0.20.0 (custom build with: cargo build --release --features indexing-ja --features indexing-zh)

Expected Behavior

Success to generate zh-Hans search index.

Current Behavior

It only works for Japanese, but not for zh-Hans. I believe it may be a bug for zola.

$ ls public/ | grep search_ 
search_index.en.json
search_index.ja.json

Build for zh-Hans

$ RUST_BACKTRACE=1 /Users/apple/zola/target/release/zola serve
Building site...
Checking all internal links with anchors.
> Successfully checked 0 internal link(s) with anchors.
-> Creating 23 pages (0 orphan) and 11 sections
Error: Failed to serve the site
Error: Tried to build search index for language zh-Hans which is not supported

Step to reproduce

  1. enable zh-Hans language and its corresponding search index flag
  2. build zola with ja and zh index feature with: cargo build --release --features indexing-ja --features indexing-zh
$ cargo build --release --features indexing-ja --features indexing-zh
    Finished `release` profile [optimized] target(s) in 0.50s
warning: the following packages contain code that will be rejected by a future version of Rust: quick-xml v0.17.2
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1`
$ du -sh ./target/release/zola
110M	./target/release/zola
  1. run custom built zola serve, and get errors.

Reference

  • commit for building: 379f6c1f622ffeb2793c11301efbcf6f49b97138
  • theme I am using: tabi

Gabirel avatar Feb 16 '25 09:02 Gabirel

I think the library expects only zh as lang

Keats avatar Feb 17 '25 20:02 Keats

Since I didn't find any specs for language code on this post, I assume Zola uses the same language code with tabi theme.

Currently tabi use ISO 639-1 Language Code List, which use zh-Hans.

Image

I suspect that the library only takes zh instead of zh-Hans or zh-Hant as well. However, tabi fails to show the correct translation to that language code. If it were a bug in tabi, I would create another issue there.

Image

Language code for tabi could be found in here: https://github.com/welpo/tabi/tree/main/i18n

Gabirel avatar Feb 18 '25 13:02 Gabirel

https://github.com/mattico/elasticlunr-rs/blob/4db7fac70fa4d6281bf527d9fae07f5a2169f252/src/lang/mod.rs#L85-L105

The upstream crate doesn't support.


A hack: create a custom i18n file named zh.toml under your root dir, then copy the content from the original one.

Image

If u want zh-Hant as well, you may just disable search for it.


Update: continue from https://github.com/welpo/tabi/issues/519

目前 zola 和 tabi 的多语言支持还是蛮糟糕的. 对于中文或日语用户, 下面是你可能遇到的一些问题:

Currently zola's multi language support is not so good. For Chinese or Japanese users, you may comes with the following problem:

  • 使用默认官方模板, 按照官方多语言指南, default_language 设置为 zh-Hans, 生成内容失败. 不能开箱即用.

    Using the official template and following the official multilingual guide, setting default_language to zh-Hans will result in failure building contents. Cannot be used out of the box.

    因为 zola 默认情况下没有中文或日语支持.

    This is because Zola does not support Chinese or Japanese by default.

    解决方案 (官方文档其实写了): 需要自行编译安装 zola, 添加分词支持: cargo install --git https://github.com/getzola/zola.git --features indexing-zh,indexing-jp zola.

    Solution: You need to compile and install Zola manually: cargo install --git https://github.com/getzola/zola.git --features indexing-zh,indexing-jp zola.

但很遗憾, 对于中文用户(含简体或繁体), 上面的做法仍然是不够的的, 原因是 zola 使用到的分词库并不接受 zh-Hans 等的写法:

For Chinese users (including Simplified or Traditional one), the above approach remains ineffective because the upstream crate used by Zola does not accept notations like zh-Hans:

https://github.com/mattico/elasticlunr-rs/blob/4db7fac70fa4d6281bf527d9fae07f5a2169f252/src/lang/mod.rs#L85-L105

impl_language! {
    (English, en),
    (Arabic, ar, #[cfg(feature = "ar")]),
    (Chinese, zh, #[cfg(feature = "zh")]),
    // ...
}

为了搜索分词, 参考评论 https://github.com/getzola/zola/issues/2800#issuecomment-2817945467, 但是并不尽善尽美, 目前还发现了一个严重问题:

For search Chinese, you may try the temporal solution I shared here https://github.com/getzola/zola/issues/2800#issuecomment-2817945467, but it is not yet perfect. Currently, the following issues remain:

  • giscus 评论组件失效

    The Giscus comment doesn't work.

    原因是 giscus 的服务器不接受 zh 的写法 (关键 API https://giscus.app/{lang}/widget 当 lang 是 zh 时会返回 404).

    The reason is that Giscus's server does not accept the notation zh (the key API https://giscus.app/{lang}/widget returns 404).

    临时解决方案是设定 lang 为 zh-CN 等 giscus 官方支持的值, 不跟随页面语言.


I confirm that giscus accepts lang code like zh-Hans or zh-Hant, the core problem is that zola doesn't accept that. For crate elasticlunr-rs, Chinese support is done by jieba-rs and it actually only supports zh-Hans but not zh-Hant (see https://github.com/messense/jieba-rs/issues/112).

The solution may be:

  • Modify zola, when handling zh-Hans lang code, use zh instead, see here:

    https://github.com/getzola/zola/blob/459d95acd418fd94f8c25e3aa984b8e7c93428c9/components/search/src/elasticlunr.rs#L83-L86

  • Modify elasticlunr-rs, correct the ISO lang code.

cxw620 avatar Apr 21 '25 08:04 cxw620