miller icon indicating copy to clipboard operation
miller copied to clipboard

DSL: add functions for base64 encoding & decoding

Open b97tsk opened this issue 1 year ago • 7 comments

Tried to rewrite my jq script but stuck immediately.

b97tsk avatar Mar 15 '23 08:03 b97tsk

Great idea @b97tsk ! :)

johnkerl avatar Mar 16 '23 14:03 johnkerl

ldap2json would certainly benefit from that! )

onoraba avatar Jan 11 '24 13:01 onoraba

@b97tsk @onoraba one question needing to be addressed is, suppose we do decode, and the data non-ASCII -- I'm not sure Miller strings (per se) would suffice anymore.

In Python there is "foo" and b"foo", and types string and bytes. I wonder if Miller would need these as well.

Can either of you perhaps share some examples of what some of your data might look like post-decode?

johnkerl avatar Jan 11 '24 14:01 johnkerl

in LDAP case base64 being used for Cyrillic strings. Exec variant is too slow

$ echo "b64==YT3Qv9GA0LjQstC10YIsYj3QvNC40YA=" | mlr --idkvp --ips '==' --ojson --no-jlistwrap put 'func base64d(s) { return exec("openssl", ["enc", "-base64", "-d"], {"stdin_string": s . "\n" }); }; for(k,v in $*) { substr1(v,-1,-1) == "=" && k !=~ "binary" { $*[k] = base64d(v); }; };' { "b64": "a=привет,b=мир" } $

onoraba avatar Jan 11 '24 18:01 onoraba

Return only string, "valid" or hexfmt like !? https://pkg.go.dev/unicode/utf8#ValidString

onoraba avatar Jan 11 '24 19:01 onoraba

@onoraba I like that!

  • In the shorter term (very easy), decode string to string
    • If ValidString: return as is
    • else return a hexfmt
  • In the longer term (a bit more work)
    • Create a bytes type in the Miller DSL
    • Support b"..." literals in the DSL
    • Extend some other functions to operate on bytes, e.g. md5()

johnkerl avatar Jan 11 '24 19:01 johnkerl

@johnkerl

Sir, adding bytes would bring much joy to parsing ms LDAP dns entries, etc. Now it is os execute with base64 | dd | iconv pipe, could be way more neater

onoraba avatar Jul 09 '24 16:07 onoraba