substrait icon indicating copy to clipboard operation
substrait copied to clipboard

Function names & URIs

Open westonpace opened this issue 2 months ago • 7 comments

This is in relation to some discussion that came up in #631

The duplicated names prevent substrait-java from being updated.

However, there is a question around whether names need to unique across ALL extensions, or just within a single extension file. While > these functions were added in error (I think) you could make the case that:

functions_arithmetic/min:timestamp,max:timestamp
functions_datetime/min:timestamp,max:timestamp

should be treated as different functions.

My understanding is that a fully qualified function name is the triple:

<function uri>,<function name>,<function arguments>

Therefore, yes, these are different functions (because the function uri is different). However, there are (at least) three problems that have never really been fully resolved:

  1. No one uses the function URI

The major producers that I am aware of today (isthmus, duckdb, ibis) either set the function URI to undefined, the empty string, or / (I think we actually have all three behaviors across the three producers :face_exhaling: )

Correspondingly, consumers tend to ignore this field. The one exception I'm aware of is Acero which will tolerate /, empty string, and undefined (Acero goes into a "fallback" mode where it does name-only matching and will match any registered function with the same name regardless of the URI) but which will accept URLs of the form https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml and also has a special URI urn:arrow:substrait_simple_extension_function which means "use the arrow compute function with the given name" (this is how we support UDFs).

  1. What is the canonical URL?

There are several choices. For example:

# Stable URI but potentially unstable definition
https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml
# Stable definition but unstable URI
https://github.com/substrait-io/substrait/blob/v0.47.0/extensions/functions_arithmetic.yaml

My preference is the former, for practical reasons.

  1. How is versioning handled?

This is discussed in more detail in https://github.com/substrait-io/substrait/issues/274

But the basic question is "what if we have tons of users and we decide to make a change to some function?"

westonpace avatar Apr 18 '24 22:04 westonpace