liquid When the key name of the parameter is Chinese, the template cannot be resolved

Checklist

[x] I have searched the issue list
[ ] I have tested my example against Shopify Liquid. (This isn't necessary if the actual behavior is a panic, or an error for which IsTemplateError returns false.)

Expected Behavior

bindings := map[string]interface{}{ "app": "test_app", "描述": "content", } template := 应用:{{app}} 描述:{{ 描述 }} expected := 应用:test_app 描述:content

Actual Behavior

syntax error in "描述" in {{ 描述 }}

Detailed Description

Possible Solution

Feb 17 '22 07:02 StrangeYear

I'm not even sure a key name can have a slash '/' in it..

May 03 '24 00:05 SleepyBrett

谢谢你的错误报告！我终于有时间来处理它了

Shopify Liquid does not support Chinese variable names either. Shopify/liquid#31 closes it as "won't fix". (It's actually got a richer history. It was fixed, and then the fix was reverted due to performance impacts.)

However, I'm not comfortable with that as a justification for this library not to support them. I'll look into what it would take to fix this.

The experience of Shopify/liquid#31 warns that this work should be accompanied by benchmarks.

Aug 30 '25 02:08 osteele

Here's an analysis and what it would take to fix this:

Current State

This Implementation

The Ragel lexer in expressions/scanner.rl uses ASCII-only character classes
Pattern: identifier = (alpha | '_') . (alnum | '_' | '-')* '?'?
alpha and alnum in Ragel only match ASCII characters

Shopify Liquid

Also doesn't officially support Unicode in variable names
Uses similar regex patterns without Unicode support: VariableSegment = /[\w\-]/
Community has requested this feature (Shopify/liquid#31) but it was reverted

What It Would Take to Fix

The Core Issue

The Ragel lexer uses ASCII-only character classes that need to be extended for Unicode support.

Implementation Approaches

Option 1: Ragel Unicode Support (Complex)

Generate Unicode character classes using Ragel's unicode2ragel.rb script
Create unicode.rl file with Unicode character class definitions (ualpha, ualnum)
Modify scanner.rl to include and use Unicode classes:
- Replace alpha with ualpha
- Replace alnum with ualnum
Regenerate scanner.go using go generate

Option 2: Post-Processing with Go's Unicode Support (Simpler)

Keep Ragel for basic tokenization but allow wider character sets
Add Unicode validation in the Go code after Ragel processing
Use Go's unicode.IsLetter() and unicode.IsDigit() to validate identifiers
Modify the identifier pattern to accept any non-ASCII bytes, then validate

Recommended Solution: Hybrid Approach

Modify scanner.rl to accept broader character ranges:
```
identifier = (alpha | '_' | 0x80..0xFF) . (alnum | '_' | '-' | 0x80..0xFF)*  '?'?
```
This allows UTF-8 continuation bytes

Add Go validation in the scanner's Identifier action:

// Validate Unicode identifier using Go's unicode package
if !isValidUnicodeIdentifier(lex.token()) {
    return error
}

Implement validation function:

func isValidUnicodeIdentifier(s string) bool {
    runes := []rune(s)
    if len(runes) == 0 {
        return false
    }
    // First character must be letter or underscore
    if !unicode.IsLetter(runes[0]) && runes[0] != '_' {
        return false
    }
    // Rest can be letters, digits, underscore, or hyphen
    for _, r := range runes[1:] {
        if !unicode.IsLetter(r) && !unicode.IsDigit(r) && 
           r != '_' && r != '-' && r != '?' {
            return false
        }
    }
    return true
}

Files to Modify

expressions/scanner.rl - Update identifier pattern
expressions/scanner.go - Will be regenerated
Add Unicode validation logic
Update tests to include Unicode test cases

Testing Requirements

Add test cases with Chinese, Japanese, Arabic, Cyrillic characters
Ensure backward compatibility with existing ASCII identifiers
Test edge cases like combining characters, emoji
Verify performance impact of Unicode validation

Considerations

Performance impact of Unicode validation
Backward compatibility
May need to handle normalization (NFC vs NFD)

Aug 30 '25 02:08 osteele