jaq icon indicating copy to clipboard operation
jaq copied to clipboard

Make @base64d tolerant of newlines

Open robin-a-meade opened this issue 8 months ago • 6 comments

It would be nice if @base64d was tolerant of newlines.

$ echo -n 'Hello, World!' | jaq -sRr @base64 | jaq -sRr @base64d
Error: "Invalid symbol 10, offset 20."

Maybe do something like https://docs.rs/testserver/0.1.1/src/testserver/lib.rs.html#56-61

Or use https://crates.io/crates/data-encoding .

robin-a-meade avatar Apr 27 '25 06:04 robin-a-meade

Hi @robin-a-meade, sorry for having taking such a long time to respond!

To understand this issue:

$ echo -ne 'a\n' | jaq -sRr @base64 | jaq -rR '.'
YQo=
$ echo -ne 'a\n' | jaq -sRr @base64 | jaq -srR '.'
YQo=

$ # notice the empty line before!

So slurping in the input (via -s) creates a newline. May I ask why you're using -s in the first place in the jaq invocation of @base64d? Without that, your filter works fine.

Apart from that, there seem to be a few differences w.r.t. base64 decoding between jq and jaq. For example:

$ echo -n 'a' | jaq -sRr @base64
YQ==
$ echo -n 'YQ==YQ==' | jq -sR '@base64d' # let's just double the previous output
"a"
$ echo -n 'YQ==YQ==' | jaq -sR '@base64d'
Error: "Invalid symbol 61, offset 2."

So jq seems to parse until the end of the base64-encoded string, and discards everything after it. I imagine that this could easily lead to silent data corruption. In contrast, jaq's @base64d only succeeds if the whole string passed to it is valid base64 data.

01mf02 avatar May 19 '25 09:05 01mf02

Hi @01mf02 , sorry for the late response, thank you for your comments.

I slurped the input (via -s) because I wanted to do a test of jaq's base64 encoding and decoding and figured a good test would be to attempt a roundtrip of encoding and decoding a bunch of text.

For example:

lorem.txt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus nec justo sit
amet erat porta condimentum vitae ut risus. Nam eu elit et massa commodo
scelerisque quis eget ex. In sit amet mi sem. Phasellus efficitur ultricies ex,
vel interdum sem dignissim vel. Integer posuere nibh risus.

The base64 utility from GNU CoreUtils is able to roundtrip base64 encoding/decoding:

base64 <lorem.txt | base64 -d

But when I plug in jaq for the decoding, it gives an error:

$ base64 <lorem.txt | jaq -sRr @base64d
Error: "Invalid symbol 10, offset 76."

If I eliminate the newlines using the -w 0 option, it works:

$ base64 -w 0 <lorem.txt | jaq -sRr @base64d
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus nec justo sit
amet erat porta condimentum vitae ut risus. Nam eu elit et massa commodo
scelerisque quis eget ex. In sit amet mi sem. Phasellus efficitur ultricies ex,
vel interdum sem dignissim vel. Integer posuere nibh risus

I researched it and learned that whether newlines should be tolerated in base64 encoded text is debatable.

The marshallpierce/rust-base64 library, that jaq depends on, has a FAQ about it.

In java world, java.util.Base64.getDecoder() does not tolerate newlines when decoding base64 encoded data, but [java.util.Base64.getMimeDecoder()](https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html) does.

NodeJS Buffer.from(base64String, 'base64').toString('utf-8'); tolerates. "Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored." https://nodejs.org/api/buffer.html

Browser Window.atob() is forgiving https://infra.spec.whatwg.org/#forgiving-base64

I'll defer to you in in the case of jaq.

robin-a-meade avatar Aug 05 '25 20:08 robin-a-meade