arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[R] please write unregister_scalar_function and/or make registration local/temporary

Open r2evans opened this issue 1 year ago • 4 comments

Describe the enhancement requested

The current mechanism for register_scalar_function is solely by side-effect, and keeping them around in the global environment can affect other uses of lazy operations.

I suspect that its "global" side-effect use is fine for most, since a lazy object can be realized/collected at any point, and the registered function must exist when collected. I suggest that this risk can be mitigated with the assumption that de-registration must be after collection.

This might be as simple as what I hope would happen with the following, without the error:

register_scalar_function("arrow_dirname", function(context, z) dirname(z), in_type=string(), out_type=string())

environment(register_scalar_function)$.cache$functions$arrow_dirname
# function (...) 
# Expression$create("arrow_dirname", ...)
# <environment: namespace:arrow>

environment(register_scalar_function)$.cache$functions$arrow_dirname <- NULL
# Error in loadNamespace(x) : there is no package called ‘*tmp*’

Component(s)

R

r2evans avatar Oct 04 '24 18:10 r2evans

Hi @r2evans, thanks for the issue.

Did you encounter this problem in a more typical, real-world scenario?

If it helps, the package already provides unregister_binding which may be enough for your use case,

https://github.com/apache/arrow/blob/8be5f9c6b41ddd840939602016811aa9c739f0a3/r/tests/testthat/test-udf.R#L87-L93

Haven't tested it though.

amoeba avatar Oct 10 '24 16:10 amoeba

Do you mean the not-exported arrow:::unregister_binding? The only exported unreg* function is unregister_extension_type.

While I didn't know about that function, it'd be bad-form in my packages to use :::-functions. Perhaps this issue could be relabeled as "export unregister_binding :-)

I can confirm that it does what I expected:

arrow::register_scalar_function(
  "quux", function(context, x) "",
  in_type=arrow::schema(x=arrow::string()), out_type=arrow::string(),
  auto_convert=TRUE)
"quux" %in% names(environment(arrow::register_scalar_function)$.cache$functions)
# [1] TRUE
arrow:::unregister_binding("quux")
"quux" %in% names(environment(arrow::register_scalar_function)$.cache$functions)
# [1] FALSE

r2evans avatar Oct 10 '24 16:10 r2evans

That's the one. I think a PR exporting it would be welcome.

amoeba avatar Oct 10 '24 17:10 amoeba

I suspect a PR that does no more than

diff --git a/NAMESPACE b/NAMESPACE
index 412d70e..e18a425 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -407,6 +407,7 @@ export(uint32)
 export(uint64)
 export(uint8)
 export(unify_schemas)
+export(unregister_binding)
 export(unregister_extension_type)
 export(utf8)
 export(value_counts)
diff --git a/R/dplyr-funcs.R b/R/dplyr-funcs.R
index c0eb47e..507b51d 100644
--- a/R/dplyr-funcs.R
+++ b/R/dplyr-funcs.R
@@ -75,6 +75,7 @@ register_binding <- function(fun_name,
   invisible(previous_fun)
 }
 
+#' @export
 unregister_binding <- function(fun_name) {
   unqualified_name <- sub("^.*?:{+}", "", fun_name)
   previous_fun <- .cache$functions[[unqualified_name]]

is the bare-minimum, but could spark some architecture discussion. For instance, to me it makes sense to document both register_binding and unregister_binding together, but the former is marked as @keywords internal, likely for a reason.

Perhaps the practical aspects of having extraneous functions added to the list of available scalar-functions is more a theoretical discussion than one with practical implications. Having now seen unregister_binding confirms in my head that it is as simple as <- NULL-removing it from the internal list, so I'm more comfortable with this understanding of how/where it's stored.

Thanks.

r2evans avatar Oct 10 '24 18:10 r2evans

This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.

github-actions[bot] avatar Nov 18 '25 11:11 github-actions[bot]