vector icon indicating copy to clipboard operation
vector copied to clipboard

Add `zip` function to VRL

Open jszwedko opened this issue 2 years ago • 1 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

While writing up upgrade docs from the old tokenizer transform to remap, I realized it'd be helpful to have a zip function in VRL that would take two arrays, one of keys, one of fields, and create an object.

Attempted Solutions

No response

Proposal

$ zip(["a", "b"], [1, 2])

{ "a": 1, "b": 2}

Will have to decide on what should happen when the arrays don't match in length.

References

No response

Version

vector 0.22.0

jszwedko avatar Jun 10 '22 21:06 jszwedko

The zip method in the Rust std library takes iterators, which neatly sidesteps the mismatched lengths by returning None when one iterator's index exceeds the length of the other.

briankung avatar Aug 08 '22 13:08 briankung

Note that this is different than the python semantic for iterating two parallel arrays:

$ zip(["a", "b"], [1, 2])
[["a", 1], ["b": 2]]

mjperrone avatar Oct 16 '22 06:10 mjperrone

Also, I am curious how you'd do the proposed semantic in VRL without the proposed zip function. I'm struggling to do this myself.

mjperrone avatar Oct 16 '22 06:10 mjperrone

Hi @mjperrone, knowing what I know now, I would probably look into when I would VRL compile time errors to occur vs when I would want fallibility to kick in. i.e. If given two literal arrays of different lengths, that could be a VRL compile time error. If you give zip an expression that returns an array, it might not be known until runtime, so zip could be made fallible in that case. Some failure cases off the top of my head:

  1. literal arrays of different lengths: compile time error (done in impl Function's fn compile, as of my last contribution)
    zip(['a'], [1, 2])
    # error[E610]: function compilation error: error[E403] invalid argument
    # error: "left" must be the same length as "right"
    
  2. expressions resolving to arrays: zip becomes fallible (done in impl FunctionExpression's fn resolve) and returns an error if the resolved arrays do not match in length
    zip(array_with_one_element(), [1, 2])
    #   │ expression can result in runtime error
    #   │ handle the error case to ensure runtime success
    
    let zipped_result, error = zip(array_with_one_element()(), [1, 2])
    # 👍
    

Another idea is to zip two arrays of mismatched lengths with the VRL version of a null value, which I think it does have.

Sorry if I'm being a bit vague here and there, I haven't had time to come back to this code base! More experienced contributors may be able to provide more direction, especially with what the error handling behavior of zip should be.

briankung avatar Oct 16 '22 14:10 briankung

Hi Brian. That is an interesting edge case to consider during implementation.

My question is actually about how to achieve this result using VRL as it is today, without the zip function being defined. I am trying to do that like this:

    obj = {}
    for_each(array!(.data[0])) -> |j, value| {
        set!(obj, get!(fields, [j]), value)
    }

but running into

2022-10-16T19:16:11.211353Z ERROR transform{component_kind="transform" component_id=unbatch_api_sensor_data component_type=remap component_name=unbatch_api_sensor_data}: vector::internal_events::remap: Mapping failed with event. error="function call error for \"for_each\" at (38:130): function call error for \"set\" at (89:124): expected string or array, got string" error_type="conversion_failed" stage="processing" internal_log_rate_limit=true

(data[0] is an array of the same size as fields). I was hoping someone like @jszwedko might have a working way to do this in VRL.

mjperrone avatar Oct 16 '22 19:10 mjperrone

Figured it out, set doesn't mutate, and dynamic paths are given as arrays:

obj = { }
for_each(array!(data[0])) -> |j, value| {
    obj = set!(obj, [get!(fields, [j])], value)
}

mjperrone avatar Oct 16 '22 21:10 mjperrone