Full curl-like HTTP input operator
The currently-available from ( get http://... ) works great if the data is being retrieved from a simple, public GET-accessible endpoint. However, there's plenty of REST-accessible data that requires being hit with different HTTP methods, additional headers, payload, etc.
Just to cite one recent example I bumped into, try.zeek.org takes a POST of Zeek script code and will return the result, e.g.:
$ curl -H "Content-Type: application/json" -X POST -d '{"sources":[{"content":"print(SSL::cipher_desc);\n","name":"main.zeek"}],"version":"5.0.0","pcap":""}' https://try.zeek.org/run
{"job": "618567", "stdout": "{\n[49241] = TLS_DH_DSS_WITH_ARIA_256_GCM_SHA384, ...
Considering curl is a lingua franca of sorts when it comes to showing REST API examples, it would be super handy if we had something with a simple 1-to-1 mapping from common curl options.
I also was reminded of Juttle's HTTP input adapter.
I just bumped into another use case. There's some sample CSV data I want to download from an Italian site to reference in some docs. The column headers come back in Italian unless I add a header like Accept-Language: en-US. So for now I'm having to download it via curl -H 'Accept-Language: en-US' and pipe it to zq, but I'd love to be able to use zq with the get operator in a one-shot, if only I had a way to add the header.
I propose we start by adding a way to specify an HTTP method, headers, and request body.
The get syntax currently looks like this:
get <uri> [format <format>]
I suggest we expand it to
get <uri> [format <format>] [method <method>] [headers <headers>] [body <body>]
where <method> and body are strings and <headers> is a Zed record with string-valued fields.
(Edited to add body.)
The changes in #4572 bring significant additional functionality along the lines of what's proposed in this issue. Here's verification in Zed commit 1ac878f where this new functionality is used to specify a non-GET method and headers/body similar to the curl shown in the opening text of the issue. I've tacked on some additional Zed that captures the spirit of what I originally intended to do with the data once retrieved.
$ zq -version
Version: v1.8.1-26-g1ac878fa
$ zq -Z '
get https://try.zeek.org/run
method "POST"
headers {"Content-Type": ["application/json"]}
body "{\"sources\":[{\"content\":\"print(SSL::cipher_desc);\\n\",\"name\":\"main.zeek\"}],\"version\":\"5.0.0\",\"pcap\":\"\"}" | split(stdout, "\n")
| over this
| grep("=")'
"[124] = TLS_RSA_WITH_3DES_EDE_CBC_RMD,"
"[150] = TLS_RSA_WITH_SEED_CBC_SHA,"
"[16] = TLS_DH_RSA_WITH_3DES_EDE_CBC_SHA,"
...
However, there's some loose ends we'll likely want to tie up, so I'm reopening this issue as a reminder for us to chip way at that. A partial list:
- The automated tests in #4572 confirm the new functionality parses in the language, but we have yet to have automated tests that actually exercise it in ways similar to what's shown above.
- The
fromoperator docs have not yet been updated to reflect the new functionality. For now the leading text in #4572 and what's shown in the example above will have to suffice. - The escaping feels inconsistent, e.g., the
headersparameter is a record literal whereasbodyis a string that therefore needs to be escaped with a tool like this for common cases like this where JSON bodies are expected. It seems it would be helpful if we could find a way to simplify this, or if not, just be very careful in how it's documented & maybe reference helper tooling (or create a new Zed helper function?) to make sure users aren't stuck by this. - Trying the new functionality as a user, following the language keyword
getwith a non-GETmethod such asPOSTfeels a little strange. I find myself wondering if it should have a generic name besidesget, e.g.,fetch,http,url,curl, oruri.
Thanks @dianetc!