liquid icon indicating copy to clipboard operation
liquid copied to clipboard

Base64 Decode Non-ASCII Data

Open jg-rp opened this issue 3 years ago • 0 comments

The base64_decode and base64_url_safe_decode filters don't seem to handle Unicode characters (or any non ASCII data) well. For example.

Ruby version 2.5.5 Liquid version 5.0.2 (unreleased)

require 'liquid'

source = <<~LIQUID
some string                  : {{ s }}
uppercase string             : {{ s | upcase }}
b64 string                   : {{ s | base64_encode }}
b64 decoded string           : {{ s | base64_encode | base64_decode }}
filter on b64 decoded string : {{ s | base64_encode | base64_decode | upcase}}
LIQUID

template = Liquid::Template.parse(source)
puts template.render('s' => 'Hello 👋, sigma σ, pound £')

Output

some string                  : Hello 👋, sigma σ, pound £
uppercase string             : HELLO 👋, SIGMA Σ, POUND £
b64 string                   : SGVsbG8g8J+Riywgc2lnbWEgz4MsIHBvdW5kIMKj
b64 decoded string           : Liquid error: internal
filter on b64 decoded string : Liquid error: internal

In isolation, Base64 decoded Unicode strings can be output without error.

puts Liquid::Template.parse("{{ 'Hello 👋, sigma σ, pound £' | base64_encode | base64_decode }}").render
# Hello 👋, sigma σ, pound £

But the 8-bit ASCII string returned from base64_decode does not play nicely with other string filters. Notice the lowercase sigma.

puts Liquid::Template.parse("{{ 'Hello 👋, sigma σ, pound £' | base64_encode | base64_decode | upcase }}").render
# HELLO 👋, SIGMA σ, POUND £

And in some cases we'll get a character encoding exception (I don't quite understand what is doing the implicit decoding of bytes to Unicode in some of these examples).

source = "{{ '£' | base64_encode | base64_decode }}"
template = Liquid::Template.parse(source)
puts "#{source} #{template.render}"

Output

Traceback (most recent call last):
test_liquid.rb:190:in `<main>': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

Perhaps base64_decode and base64_url_safe_decode should take an optional character encoding argument, defaulting to UTF-8.

jg-rp avatar Jun 04 '21 15:06 jg-rp