When the key name of the parameter is Chinese, the template cannot be resolved
Checklist
- [x] I have searched the issue list
- [ ] I have tested my example against Shopify Liquid. (This isn't necessary if the actual behavior is a panic, or an error for which
IsTemplateErrorreturns false.)
Expected Behavior
bindings := map[string]interface{}{
"app": "test_app",
"描述": "content",
}
template := 应用:{{app}} 描述:{{ 描述 }}
expected := 应用:test_app 描述:content
Actual Behavior
syntax error in "描述" in {{ 描述 }}
Detailed Description
Possible Solution
I'm not even sure a key name can have a slash '/' in it..
谢谢你的错误报告!我终于有时间来处理它了
Shopify Liquid does not support Chinese variable names either. Shopify/liquid#31 closes it as "won't fix". (It's actually got a richer history. It was fixed, and then the fix was reverted due to performance impacts.)
However, I'm not comfortable with that as a justification for this library not to support them. I'll look into what it would take to fix this.
The experience of Shopify/liquid#31 warns that this work should be accompanied by benchmarks.
Here's an analysis and what it would take to fix this:
Current State
This Implementation
- The Ragel lexer in
expressions/scanner.rluses ASCII-only character classes - Pattern:
identifier = (alpha | '_') . (alnum | '_' | '-')* '?'? alphaandalnumin Ragel only match ASCII characters
Shopify Liquid
- Also doesn't officially support Unicode in variable names
- Uses similar regex patterns without Unicode support:
VariableSegment = /[\w\-]/ - Community has requested this feature (Shopify/liquid#31) but it was reverted
What It Would Take to Fix
The Core Issue
The Ragel lexer uses ASCII-only character classes that need to be extended for Unicode support.
Implementation Approaches
Option 1: Ragel Unicode Support (Complex)
- Generate Unicode character classes using Ragel's
unicode2ragel.rbscript - Create unicode.rl file with Unicode character class definitions (ualpha, ualnum)
- Modify scanner.rl to include and use Unicode classes:
- Replace
alphawithualpha - Replace
alnumwithualnum
- Replace
- Regenerate scanner.go using
go generate
Option 2: Post-Processing with Go's Unicode Support (Simpler)
- Keep Ragel for basic tokenization but allow wider character sets
- Add Unicode validation in the Go code after Ragel processing
- Use Go's
unicode.IsLetter()andunicode.IsDigit()to validate identifiers - Modify the identifier pattern to accept any non-ASCII bytes, then validate
Recommended Solution: Hybrid Approach
-
Modify scanner.rl to accept broader character ranges:
identifier = (alpha | '_' | 0x80..0xFF) . (alnum | '_' | '-' | 0x80..0xFF)* '?'?This allows UTF-8 continuation bytes
-
Add Go validation in the scanner's Identifier action:
// Validate Unicode identifier using Go's unicode package if !isValidUnicodeIdentifier(lex.token()) { return error } -
Implement validation function:
func isValidUnicodeIdentifier(s string) bool { runes := []rune(s) if len(runes) == 0 { return false } // First character must be letter or underscore if !unicode.IsLetter(runes[0]) && runes[0] != '_' { return false } // Rest can be letters, digits, underscore, or hyphen for _, r := range runes[1:] { if !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != '_' && r != '-' && r != '?' { return false } } return true }
Files to Modify
expressions/scanner.rl- Update identifier patternexpressions/scanner.go- Will be regenerated- Add Unicode validation logic
- Update tests to include Unicode test cases
Testing Requirements
- Add test cases with Chinese, Japanese, Arabic, Cyrillic characters
- Ensure backward compatibility with existing ASCII identifiers
- Test edge cases like combining characters, emoji
- Verify performance impact of Unicode validation
Considerations
- Performance impact of Unicode validation
- Backward compatibility
- May need to handle normalization (NFC vs NFD)