perf: ExpressionReferences::variables and ExpressionReferences::functions should return iterators

Open cgettys-microsoft opened this issue 4 months ago • 0 comments

The main scenarios where you should use ExpressionReferences::variables and ExpressionReferences::functions are if you want to access every variable or every function.

If you have a list of n functions or variables you want to know if are used in an expression, you should probably just use has_variable or has_function - even if you have to call it n times, each of those lookups is O(1) and requires zero allocations.

But if you do need to enumerate the variables or functions, the current implementation has some drawbacks I think this is a clear pessimization:

If you're only going to iterate over the variables or functions once using that vector (not retaining it), which I strongly suspect is common, it's forcing short-lived heap allocations and unnecessary copying of &str into the vector.
While in most cases the has_ methods are preferable anyway, it limits your ability to choose other collections efficiently in scenarios where you don't. E.g. if you want a HashSet or BTreeSet (for intersecting, sorting, merge-joining, whatever), the Vec is just wasted.

What's more, the examples were a bit misleading; the one for ExpressionReferences::functions would be much more efficiently achieved via ExpressionReferences::has_function So improve the docs & doctests a bit.

This is a semver change; code using these two functions might no longer compile. The change required to callers, however, is incredibly minimal. If you called .functions() or .variables(), and do need a Vec, simply use .functions().collect::<Vec<&str>>() or .variables().collect::<Vec<&str>>() instead. In the case where you use it in a for loop or elsewhere (i.e. you just iterated over it anyway), I believe it will likely compile without changes.

Aug 01 '25 21:08 cgettys-microsoft