[R] please write unregister_scalar_function and/or make registration local/temporary
Describe the enhancement requested
The current mechanism for register_scalar_function is solely by side-effect, and keeping them around in the global environment can affect other uses of lazy operations.
I suspect that its "global" side-effect use is fine for most, since a lazy object can be realized/collected at any point, and the registered function must exist when collected. I suggest that this risk can be mitigated with the assumption that de-registration must be after collection.
This might be as simple as what I hope would happen with the following, without the error:
register_scalar_function("arrow_dirname", function(context, z) dirname(z), in_type=string(), out_type=string())
environment(register_scalar_function)$.cache$functions$arrow_dirname
# function (...)
# Expression$create("arrow_dirname", ...)
# <environment: namespace:arrow>
environment(register_scalar_function)$.cache$functions$arrow_dirname <- NULL
# Error in loadNamespace(x) : there is no package called ‘*tmp*’
Component(s)
R
Hi @r2evans, thanks for the issue.
Did you encounter this problem in a more typical, real-world scenario?
If it helps, the package already provides unregister_binding which may be enough for your use case,
https://github.com/apache/arrow/blob/8be5f9c6b41ddd840939602016811aa9c739f0a3/r/tests/testthat/test-udf.R#L87-L93
Haven't tested it though.
Do you mean the not-exported arrow:::unregister_binding? The only exported unreg* function is unregister_extension_type.
While I didn't know about that function, it'd be bad-form in my packages to use :::-functions. Perhaps this issue could be relabeled as "export unregister_binding :-)
I can confirm that it does what I expected:
arrow::register_scalar_function(
"quux", function(context, x) "",
in_type=arrow::schema(x=arrow::string()), out_type=arrow::string(),
auto_convert=TRUE)
"quux" %in% names(environment(arrow::register_scalar_function)$.cache$functions)
# [1] TRUE
arrow:::unregister_binding("quux")
"quux" %in% names(environment(arrow::register_scalar_function)$.cache$functions)
# [1] FALSE
That's the one. I think a PR exporting it would be welcome.
I suspect a PR that does no more than
diff --git a/NAMESPACE b/NAMESPACE
index 412d70e..e18a425 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -407,6 +407,7 @@ export(uint32)
export(uint64)
export(uint8)
export(unify_schemas)
+export(unregister_binding)
export(unregister_extension_type)
export(utf8)
export(value_counts)
diff --git a/R/dplyr-funcs.R b/R/dplyr-funcs.R
index c0eb47e..507b51d 100644
--- a/R/dplyr-funcs.R
+++ b/R/dplyr-funcs.R
@@ -75,6 +75,7 @@ register_binding <- function(fun_name,
invisible(previous_fun)
}
+#' @export
unregister_binding <- function(fun_name) {
unqualified_name <- sub("^.*?:{+}", "", fun_name)
previous_fun <- .cache$functions[[unqualified_name]]
is the bare-minimum, but could spark some architecture discussion. For instance, to me it makes sense to document both register_binding and unregister_binding together, but the former is marked as @keywords internal, likely for a reason.
Perhaps the practical aspects of having extraneous functions added to the list of available scalar-functions is more a theoretical discussion than one with practical implications. Having now seen unregister_binding confirms in my head that it is as simple as <- NULL-removing it from the internal list, so I'm more comfortable with this understanding of how/where it's stored.
Thanks.
This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.