liquid
liquid copied to clipboard
Base64 Decode Non-ASCII Data
The base64_decode
and base64_url_safe_decode
filters don't seem to handle Unicode characters (or any non ASCII data) well. For example.
Ruby version 2.5.5 Liquid version 5.0.2 (unreleased)
require 'liquid'
source = <<~LIQUID
some string : {{ s }}
uppercase string : {{ s | upcase }}
b64 string : {{ s | base64_encode }}
b64 decoded string : {{ s | base64_encode | base64_decode }}
filter on b64 decoded string : {{ s | base64_encode | base64_decode | upcase}}
LIQUID
template = Liquid::Template.parse(source)
puts template.render('s' => 'Hello 👋, sigma σ, pound £')
Output
some string : Hello 👋, sigma σ, pound £
uppercase string : HELLO 👋, SIGMA Σ, POUND £
b64 string : SGVsbG8g8J+Riywgc2lnbWEgz4MsIHBvdW5kIMKj
b64 decoded string : Liquid error: internal
filter on b64 decoded string : Liquid error: internal
In isolation, Base64 decoded Unicode strings can be output without error.
puts Liquid::Template.parse("{{ 'Hello 👋, sigma σ, pound £' | base64_encode | base64_decode }}").render
# Hello 👋, sigma σ, pound £
But the 8-bit ASCII string returned from base64_decode
does not play nicely with other string filters. Notice the lowercase sigma.
puts Liquid::Template.parse("{{ 'Hello 👋, sigma σ, pound £' | base64_encode | base64_decode | upcase }}").render
# HELLO 👋, SIGMA σ, POUND £
And in some cases we'll get a character encoding exception (I don't quite understand what is doing the implicit decoding of bytes to Unicode in some of these examples).
source = "{{ '£' | base64_encode | base64_decode }}"
template = Liquid::Template.parse(source)
puts "#{source} #{template.render}"
Output
Traceback (most recent call last):
test_liquid.rb:190:in `<main>': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
Perhaps base64_decode
and base64_url_safe_decode
should take an optional character encoding
argument, defaulting to UTF-8
.