zed icon indicating copy to clipboard operation
zed copied to clipboard

Allow "from" operator to accept upstream input

Open philrz opened this issue 2 years ago • 3 comments

At the time of the filing of this issue, Zed is at commit 2689e24.

This topic was recently highlighted thanks to a user that asked the following question in a community Slack thread:

is there a way to make get requests with an authentication header? what I'm trying to do is take a CSV, take a specific column, input into an API request, and pick a field from the API response, and then merge that result into a new CSV

We ultimately helped the user by using proposing some shell glue with multiple invocations to zq. However, their experience exposed a limitation of the many input operators (from operator et al): They currently can only sit at the "head" of a Zed pipeline. However, in this case the user ultimately wanted to take information from upstream in the Zed pipeline and make it part of an HTTP request, e.g., their pseudocode:

What I'm trying to do is run something like this: http https://company.clearbit.com/v1/domains/find name=="$COMPANY_NAME_FROM_ZED_ITERATION" --auth=key | jq .key For each entry in an array from a zed expression

Thinking about this in a more general way, @nwt recognized that the input operators could be changed so that when data comes from upstream, the operator could run itself and everything downstream once per this value that comes from the upstream. This would enable things such as:

  1. Populating headers/body/URLs/etc. of an HTTP request
    1. That should hopefully cover his user's specific inquiry
    2. FWIW, when I opened #4267, I envisioned something like this being a necessary building block
  2. Sourcing input data from pools/filenames generated upstream

In a discussion of this topic, @nwt and @mccanne recognized that special care would need to be taken for the optimizer to remain effective despite this dynamic behavior.

In another more recent discussion about the API-enabling variant of this, @mccanne wondered if we might have some kind of async/pipeline flag that would allow for parallelizing the HTTP connections invoked. This would mean the order is not guaranteed across the results, so another flag could be used to guarantee order when that's needed.

philrz avatar Aug 14 '23 20:08 philrz

Note to self: The API-enabling variant of this definitely seems worthy of a blog post once it's done.

philrz avatar Oct 12 '23 19:10 philrz

We've been discussing this one as a team but have had consistent uncertainty about the correct design approach, so we're putting the topic back on ice for a bit while we focus on other priorities.

philrz avatar Dec 21 '23 18:12 philrz