substrait
substrait copied to clipboard
Function names & URIs
This is in relation to some discussion that came up in #631
The duplicated names prevent substrait-java from being updated.
However, there is a question around whether names need to unique across ALL extensions, or just within a single extension file. While > these functions were added in error (I think) you could make the case that:
functions_arithmetic/min:timestamp,max:timestamp functions_datetime/min:timestamp,max:timestamp
should be treated as different functions.
My understanding is that a fully qualified function name is the triple:
<function uri>,<function name>,<function arguments>
Therefore, yes, these are different functions (because the function uri is different). However, there are (at least) three problems that have never really been fully resolved:
- No one uses the function URI
The major producers that I am aware of today (isthmus, duckdb, ibis) either set the function URI to undefined, the empty string, or /
(I think we actually have all three behaviors across the three producers :face_exhaling: )
Correspondingly, consumers tend to ignore this field. The one exception I'm aware of is Acero which will tolerate /
, empty string, and undefined (Acero goes into a "fallback" mode where it does name-only matching and will match any registered function with the same name regardless of the URI) but which will accept URLs of the form https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml
and also has a special URI urn:arrow:substrait_simple_extension_function
which means "use the arrow compute function with the given name" (this is how we support UDFs).
- What is the canonical URL?
There are several choices. For example:
# Stable URI but potentially unstable definition
https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml
# Stable definition but unstable URI
https://github.com/substrait-io/substrait/blob/v0.47.0/extensions/functions_arithmetic.yaml
My preference is the former, for practical reasons.
- How is versioning handled?
This is discussed in more detail in https://github.com/substrait-io/substrait/issues/274
But the basic question is "what if we have tons of users and we decide to make a change to some function?"