Make @base64d tolerant of newlines
It would be nice if @base64d was tolerant of newlines.
$ echo -n 'Hello, World!' | jaq -sRr @base64 | jaq -sRr @base64d
Error: "Invalid symbol 10, offset 20."
Maybe do something like https://docs.rs/testserver/0.1.1/src/testserver/lib.rs.html#56-61
Or use https://crates.io/crates/data-encoding .
Hi @robin-a-meade, sorry for having taking such a long time to respond!
To understand this issue:
$ echo -ne 'a\n' | jaq -sRr @base64 | jaq -rR '.'
YQo=
$ echo -ne 'a\n' | jaq -sRr @base64 | jaq -srR '.'
YQo=
$ # notice the empty line before!
So slurping in the input (via -s) creates a newline.
May I ask why you're using -s in the first place in the jaq invocation of @base64d? Without that, your filter works fine.
Apart from that, there seem to be a few differences w.r.t. base64 decoding between jq and jaq. For example:
$ echo -n 'a' | jaq -sRr @base64
YQ==
$ echo -n 'YQ==YQ==' | jq -sR '@base64d' # let's just double the previous output
"a"
$ echo -n 'YQ==YQ==' | jaq -sR '@base64d'
Error: "Invalid symbol 61, offset 2."
So jq seems to parse until the end of the base64-encoded string, and discards everything after it.
I imagine that this could easily lead to silent data corruption.
In contrast, jaq's @base64d only succeeds if the whole string passed to it is valid base64 data.
Hi @01mf02 , sorry for the late response, thank you for your comments.
I slurped the input (via -s) because I wanted to do a test of jaq's base64 encoding and decoding and figured a good test would be to attempt a roundtrip of encoding and decoding a bunch of text.
For example:
lorem.txt
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus nec justo sit
amet erat porta condimentum vitae ut risus. Nam eu elit et massa commodo
scelerisque quis eget ex. In sit amet mi sem. Phasellus efficitur ultricies ex,
vel interdum sem dignissim vel. Integer posuere nibh risus.
The base64 utility from GNU CoreUtils is able to roundtrip base64 encoding/decoding:
base64 <lorem.txt | base64 -d
But when I plug in jaq for the decoding, it gives an error:
$ base64 <lorem.txt | jaq -sRr @base64d
Error: "Invalid symbol 10, offset 76."
If I eliminate the newlines using the -w 0 option, it works:
$ base64 -w 0 <lorem.txt | jaq -sRr @base64d
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus nec justo sit
amet erat porta condimentum vitae ut risus. Nam eu elit et massa commodo
scelerisque quis eget ex. In sit amet mi sem. Phasellus efficitur ultricies ex,
vel interdum sem dignissim vel. Integer posuere nibh risus
I researched it and learned that whether newlines should be tolerated in base64 encoded text is debatable.
The marshallpierce/rust-base64 library, that jaq depends on, has a FAQ about it.
In java world, java.util.Base64.getDecoder() does not tolerate newlines when decoding base64 encoded data, but [java.util.Base64.getMimeDecoder()](https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html) does.
NodeJS Buffer.from(base64String, 'base64').toString('utf-8'); tolerates. "Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored." https://nodejs.org/api/buffer.html
Browser Window.atob() is forgiving https://infra.spec.whatwg.org/#forgiving-base64
I'll defer to you in in the case of jaq.