zed icon indicating copy to clipboard operation
zed copied to clipboard

Full curl-like HTTP input operator

Open philrz opened this issue 3 years ago • 3 comments

The currently-available from ( get http://... ) works great if the data is being retrieved from a simple, public GET-accessible endpoint. However, there's plenty of REST-accessible data that requires being hit with different HTTP methods, additional headers, payload, etc.

Just to cite one recent example I bumped into, try.zeek.org takes a POST of Zeek script code and will return the result, e.g.:

$ curl -H "Content-Type: application/json" -X POST -d '{"sources":[{"content":"print(SSL::cipher_desc);\n","name":"main.zeek"}],"version":"5.0.0","pcap":""}' https://try.zeek.org/run

{"job": "618567", "stdout": "{\n[49241] = TLS_DH_DSS_WITH_ARIA_256_GCM_SHA384, ...

Considering curl is a lingua franca of sorts when it comes to showing REST API examples, it would be super handy if we had something with a simple 1-to-1 mapping from common curl options.

I also was reminded of Juttle's HTTP input adapter.

philrz avatar Nov 22 '22 17:11 philrz

I just bumped into another use case. There's some sample CSV data I want to download from an Italian site to reference in some docs. The column headers come back in Italian unless I add a header like Accept-Language: en-US. So for now I'm having to download it via curl -H 'Accept-Language: en-US' and pipe it to zq, but I'd love to be able to use zq with the get operator in a one-shot, if only I had a way to add the header.

philrz avatar Jan 18 '23 02:01 philrz

I propose we start by adding a way to specify an HTTP method, headers, and request body.

The get syntax currently looks like this:

get <uri> [format <format>]

I suggest we expand it to

get <uri> [format <format>] [method <method>] [headers <headers>] [body <body>]

where <method> and body are strings and <headers> is a Zed record with string-valued fields.

(Edited to add body.)

nwt avatar Mar 21 '23 16:03 nwt

The changes in #4572 bring significant additional functionality along the lines of what's proposed in this issue. Here's verification in Zed commit 1ac878f where this new functionality is used to specify a non-GET method and headers/body similar to the curl shown in the opening text of the issue. I've tacked on some additional Zed that captures the spirit of what I originally intended to do with the data once retrieved.

$ zq -version
Version: v1.8.1-26-g1ac878fa

$ zq -Z '
get https://try.zeek.org/run
  method "POST"
  headers {"Content-Type": ["application/json"]}
  body "{\"sources\":[{\"content\":\"print(SSL::cipher_desc);\\n\",\"name\":\"main.zeek\"}],\"version\":\"5.0.0\",\"pcap\":\"\"}" | split(stdout, "\n")
  | over this
  | grep("=")'

"[124] = TLS_RSA_WITH_3DES_EDE_CBC_RMD,"
"[150] = TLS_RSA_WITH_SEED_CBC_SHA,"
"[16] = TLS_DH_RSA_WITH_3DES_EDE_CBC_SHA,"
...

However, there's some loose ends we'll likely want to tie up, so I'm reopening this issue as a reminder for us to chip way at that. A partial list:

  1. The automated tests in #4572 confirm the new functionality parses in the language, but we have yet to have automated tests that actually exercise it in ways similar to what's shown above.
  2. The from operator docs have not yet been updated to reflect the new functionality. For now the leading text in #4572 and what's shown in the example above will have to suffice.
  3. The escaping feels inconsistent, e.g., the headers parameter is a record literal whereas body is a string that therefore needs to be escaped with a tool like this for common cases like this where JSON bodies are expected. It seems it would be helpful if we could find a way to simplify this, or if not, just be very careful in how it's documented & maybe reference helper tooling (or create a new Zed helper function?) to make sure users aren't stuck by this.
  4. Trying the new functionality as a user, following the language keyword get with a non-GET method such as POST feels a little strange. I find myself wondering if it should have a generic name besides get, e.g., fetch, http, url, curl, or uri.

Thanks @dianetc!

philrz avatar Jun 10 '23 00:06 philrz