crystal icon indicating copy to clipboard operation
crystal copied to clipboard

`Enumerable` method to find the first truthy block result and return that result

Open jgaskins opened this issue 1 year ago • 8 comments

Feature Request

Is your feature request related to a problem? Please describe clearly and concisely what is it.

I have a frequent need for something that is similar to Enumerable#find, but rather than returning the element for which the block is truthy, it returns that truthy block result itself.

This comes up when I have an array of objects that may or may not have a property set. The most recent need for this came up when implementing tool use for LLM APIs. The API response has an array of choices, each with a message, and those messages may or may not have a tool_calls array.

record Response, choices : Array(Choice)
record Choice, message : Message
record Message, tool_calls : Array(ToolCall)?
record ToolCall

response = Response.new([Choice.new(Message.new([ToolCall.new]))])

Describe the feature you would like, optionally illustrated by examples, and how it will solve the above problem.

What I've been using in my code to make this more expressive is a monkeypatch on Enumerable called find_first:

if tool_calls = response.choices.find_first &.message.tool_calls
  # find the tools for these calls, execute them, and send the results back to the API
end

Describe considered alternative solutions, and the reasons why you have not proposed them as a solution here.

Currently, as far as I can tell, at least, the only ways to do this involve either executing that same criteria again:

if choice_with_tool_call = response.choices.find &.message.tool_calls
  # the method chain for choice.message.tool_calls has to be repeated
  tool_calls = choice_with_tool_call
    .message
    .tool_calls
end

if tool_calls
  # find the tools for these calls, execute them, and send the results back to the API
end

Or this format with the block expanded

# or this
response.choices.each do |choice|
  if tool_calls = choice.message.tool_calls
    # find the tools for these calls, execute them, and send the results back to the API
    break
  end
end

These are both a lot more verbose, especially when there are multiple layers of these arrays. Having a method that returns the truthy value lets us use the shorthand block syntax and have a single expression that does all of these things smoothly.

Does it break backward compatibility, if yes then what's the migration path?

Nope. Purely additive.

jgaskins avatar Aug 07 '24 18:08 jgaskins

To be clear, I'm not attached to the find_first name. It's the name I've been using only because I haven't come up with anything better.

jgaskins avatar Aug 07 '24 18:08 jgaskins

response = Response.new([Choice.new(Message.new(nil)), Choice.new(Message.new([ToolCall.new, ToolCall.new]))])

if tool_calls = response.choices.each { |c| tc = c.message.tool_calls; break tc if tc }
  pp tool_calls # => [ToolCall(), ToolCall()]
end

This is best I could come up with without introducing something new. As reference only.

Blacksmoke16 avatar Aug 07 '24 18:08 Blacksmoke16

It is still slightly clunky, but at least it doesn't involve break:

[1, 2, 3].each.map { |z| evaluate(z) }.find &.itself

If it is fine to find the first not-nil-value, then [1, 2, 3].each.compact_map { |z| evaluate(z) }.first is even better

yxhuvud avatar Aug 07 '24 18:08 yxhuvud

Ruby doesn't have it, but in Elixir it is called Enum.find_value/3

HertzDevil avatar Aug 07 '24 19:08 HertzDevil

Sounds good to me. I'd prefer the name find_value over find_first.

straight-shoota avatar Aug 13 '24 06:08 straight-shoota

In this comment, @Sija brought up an interesting point I hadn't considered. I normally use this for nil checks (and even then only use shorthand blocks), but are there any use cases for false triggering an early return?

Part of me wants to push back against it because I don't know of any other Enumerable methods where that is the case, but also I don't have any stake at all in how false is treated, so I thought I'd bring it here for discussion.

jgaskins avatar Aug 13 '24 19:08 jgaskins

Elixir's find_value skips over both falsy values: nil and false, just FTR.

Sija avatar Aug 13 '24 19:08 Sija

I would argue this method should have the same semantics as find wrt nil vs falsey.

yxhuvud avatar Aug 14 '24 06:08 yxhuvud

I came here to share this observation, but @yxhuvud already did:

Yes, this is the same as:

[1, 2, 3].each.map { |x| evaluate(x) }.find &.itself

~~But the issue is that map materializes the whole array! If it didn't, then we could push back against adding this method.~~

And it is made even more clunky by the fact that .each. is necessary there. If you forget to add it then map would materialize the whole array.

oprypin avatar Nov 02 '24 14:11 oprypin

Also fun fact: These methods map and to_a are exact aliases of each other

  • https://github.com/crystal-lang/crystal/blob/4aac6f2ee3494a593d79eeea389c77037e025adc/src/enumerable.cr#L2007
  • https://github.com/crystal-lang/crystal/blob/4aac6f2ee3494a593d79eeea389c77037e025adc/src/enumerable.cr#L1016

(which makes me wish that map was lazy even more)

oprypin avatar Nov 02 '24 14:11 oprypin

Not only is the method chain clunkier, but since it also involves a heap allocation (Iterators have state), it's also slower:

With tool calls
                find_value   2.09M (477.64ns) (± 1.45%)    0.0B/op        fastest
each.compact_map(&).first? 134.17k (  7.45µs) (± 0.85%)  31.2kB/op  15.60× slower
each.map(&).find(&.itself) 128.57k (  7.78µs) (± 1.45%)  31.2kB/op  16.28× slower

Without tool calls
                find_value   1.57M (638.94ns) (± 5.47%)    0.0B/op        fastest
each.compact_map(&).first? 132.08k (  7.57µs) (± 1.62%)  31.2kB/op  11.85× slower
each.map(&).find(&.itself) 111.00k (  9.01µs) (± 0.92%)  31.2kB/op  14.10× slower
Benchmark code
require "benchmark"

response_with_tool_calls = Response.new([Choice.new(Message.new([ToolCall.new]))])
response_without_tool_calls = Response.new([Choice.new(Message.new(nil))])

# Need to run the block multiple times because each iteration for find_value
# takes less time than the monotonic clock can measure
iterations = 1_000
[response_with_tool_calls, response_without_tool_calls].each do |response|
  tool_calls = response.choices.find_value &.message.tool_calls
  puts "#{tool_calls ? "With" : "Without"} tool calls"
  Benchmark.ips do |x|
    x.report "find_value" { iterations.times { tool_calls = response.choices.find_value &.message.tool_calls } }
    x.report "each.compact_map(&).first?" { iterations.times { tool_calls = response.choices.each.compact_map(&.message.tool_calls).first? } }
    x.report "each.map(&).find(&.itself)" { iterations.times { tool_calls = response.choices.each.map(&.message.tool_calls).find &.itself } }
  end
  puts

  # Side effect so LLVM won't optimize out the benchmark
  pp tool_calls unless tool_calls
end

record Response, choices : Array(Choice)
record Choice, message : Message
record Message, tool_calls : Array(ToolCall)?
record ToolCall

module Enumerable(T)
  # Same implementation as https://github.com/crystal-lang/crystal/pull/14893
  def find_value(if_none = nil, & : T ->)
    each do |i|
      if result = yield i
        return result
      end
    end
    if_none
  end
end

jgaskins avatar Nov 02 '24 18:11 jgaskins