TileDB-R icon indicating copy to clipboard operation
TileDB-R copied to clipboard

0.21.0 broken on macOS 12 and below when built via Macports

Open barracuda156 opened this issue 9 months ago • 23 comments

** testing if installed package can be loaded from temporary location
sh: line 1: 82765 Illegal instruction: 4  R_TESTS= '/opt/local/Library/Frameworks/R.framework/Resources/bin/R' --no-save --no-restore --no-echo 2>&1 < '/opt/local/var/macports/build/_opt_bblocal_var_buildworker_ports_build_ports_R_R-tiledb/R-tiledb/work/.tmp/RtmpB4Gi86/file13fce631f3ee1'

 *** caught illegal operation ***
address 0x10bdb08e2, cause 'illegal opcode'

Traceback:
 1: dyn.load(file, DLLpath = DLLpath, ...)
 2: library.dynam(lib, package, package.lib)
 3: loadNamespace(package, lib.loc)
 4: doTryCatch(return(expr), name, parentenv, handler)
 5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 6: tryCatchList(expr, classes, parentenv, handlers)
 7: tryCatch({    attr(package, "LibPath") <- which.lib.loc    ns <- loadNamespace(package, lib.loc)    env <- attachNamespace(ns, pos = pos, deps, exclude, include.only)}, error = function(e) {    P <- if (!is.null(cc <- conditionCall(e)))         paste(" in", deparse(cc)[1L])    else ""    msg <- gettextf("package or namespace load failed for %s%s:\n %s",         sQuote(package), P, conditionMessage(e))    if (logical.return && !quietly)         message(paste("Error:", msg), domain = NA)    else stop(msg, call. = FALSE, domain = NA)})
 8: library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE)
 9: withCallingHandlers(expr, packageStartupMessage = function(c) tryInvokeRestart("muffleMessage"))
10: suppressPackageStartupMessages(library(pkg_name, lib.loc = lib,     character.only = TRUE, logical.return = TRUE))
11: doTryCatch(return(expr), name, parentenv, handler)
12: tryCatchOne(expr, names, parentenv, handlers[[1L]])
13: tryCatchList(expr, classes, parentenv, handlers)
14: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys.call(-4L)        dcall <- deparse(call, nlines = 1L)        prefix <- paste("Error in", dcall, ": ")        LONG <- 75L        sm <- strsplit(conditionMessage(e), "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))             w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix, conditionMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && isTRUE(getOption("show.error.messages"))) {        cat(msg, file = outFile)        .Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})
15: try(suppressPackageStartupMessages(library(pkg_name, lib.loc = lib,     character.only = TRUE, logical.return = TRUE)))
16: tools:::.test_load_package("tiledb", "/opt/local/var/macports/build/_opt_bblocal_var_buildworker_ports_build_ports_R_R-tiledb/R-tiledb/work/destroot/opt/local/Library/Frameworks/R.framework/Versions/4.3/Resources/library/00LOCK-tiledb/00new")
An irrecoverable exception occurred. R is aborting now ...
ERROR: loading failed

https://build.macports.org/builders/ports-10.15_x86_64-builder/builds/155770/steps/install-port/logs/stdio

The same on every macOS version from 12 down, examples: https://build.macports.org/builders/ports-12_x86_64-builder/builds/82096/steps/install-port/logs/stdio https://build.macports.org/builders/ports-11_x86_64-builder/builds/127411/steps/install-port/logs/stdio https://build.macports.org/builders/ports-10.13_x86_64-builder/builds/205106/steps/install-port/logs/stdio

barracuda156 avatar Sep 15 '23 14:09 barracuda156

Thanks for the report. It works at CRAN where macOS binaries for tiledb-r 0.21.0 have already appeared. Quoting from the CRAN page now and less than 20 or so hours after this was uploaded:

macOS binaries: r-release (arm64): tiledb_0.21.0.tgz, r-oldrel (arm64): tiledb_0.21.0.tgz, r-release (x86_64): tiledb_0.21.0.tgz, r-oldrel (x86_64): tiledb_0.20.3.tgz

As we do not work with macports nor consider this to be a direct release architecture, maybe you can liase with them and smooth this out? If you find that something needs to change at our end we will gladly consider clean PRs.

eddelbuettel avatar Sep 15 '23 14:09 eddelbuettel

@eddelbuettel Thank you for responding. We will look into that, I have already opened a ticket on Trac too: https://trac.macports.org/ticket/68162

barracuda156 avatar Sep 15 '23 14:09 barracuda156

@eddelbuettel MacPorts just builds the software created by its developers. When it does not build, we report things to the developers so that they can fix them. Sergey is the maintainer of this software in MacPorts and he is reporting to you that it does not build. This does not mean MacPorts is the problem; it means there is a build scenario you haven't considered and should consider. At the moment, I suspect it may be the compiler version. I will add more notes to the MacPorts ticket.

ryandesign avatar Sep 15 '23 21:09 ryandesign

I concur that it is likely the compiler version, and we do state our requirements.

I retirerate that this is a package written for R ecosystem, and CRAN sets pretty clear standards. In particular, the macOS maintainer for CRAN and R Core is quite adamant in his recommendation that 'his' tools from mac.r-project.org should be used (even though in real life many people use homebrew "just because"). So it appears that we are standing in the middle of a little skirmish between macports and CRAN. We have little to add here: we use CRAN tooling, and that works.

eddelbuettel avatar Sep 15 '23 21:09 eddelbuettel

@eddelbuettel Well, if something does not build on every recent macOS on the standard arch (x86_64) and with the modern and standard compiler (clang-15), it suggests that there is likely a bug in the code, even if CRAN tools may get around or obscure it. Also notice that aarch64 builds are fine: https://ports.macports.org/port/R-tiledb/details

(Were the error happening only on old macOS or unsupported archs, I would first assume the problem is there; here, however, GCC builds R-tiledb fine on my 10.6.8, but not with Clang on new macOS.)

barracuda156 avatar Sep 15 '23 21:09 barracuda156

@barracuda156 You have access to the platform where it fails, we do not. That means you can reproduce, we cannot. As I said in first comment, we will gladly entertain clean and well reasoned PRs without side effects. Otherwise our work may prioritise progress on platforms used by our clients and/or us. This currently covers neither powerpc, nor macports.

eddelbuettel avatar Sep 15 '23 21:09 eddelbuettel

I see these requirements listed:

https://github.com/TileDB-Inc/TileDB-R/blob/3c675c6135d35388df3bab9c804f46956637e3aa/DESCRIPTION#L20-L21

This appears to be a typo; you appear to have meant macOS 10.14, not 11.14 (which does not exist).

In this issue we are talking about macOS 12.6.8 on x86_64 building with MacPorts clang 15.0.7. PowerPC processors are not involved.

I am not familiar with R or CRAN or its policies or recommendations. MacPorts is a build from source system. Sergey has been instrumental in getting at this point over well over 4,000 R modules into MacPorts for the benefit of R users who like to install their software that way, and we greatly appreciate his efforts in improving MacPorts.

ryandesign avatar Sep 15 '23 21:09 ryandesign

Indeed a typo. Code is truth, as always. configure.ac has

https://github.com/TileDB-Inc/TileDB-R/blob/3c675c6135d35388df3bab9c804f46956637e3aa/configure.ac#L161-L166

Again, we work with what we have access to, which is CRAN's setup on macOS. I think we should close this now as we are going in circles. We can support your port on a best-efforts, basis but our priorities lie elsewhere so we cannot port this for you. (I have a personal side project support 20k binaries from CRAN for Ubuntu on two Ubuntu LTS build flavours but life is easier on x86_64: https://eddelbuettel.github.io/r2u)

eddelbuettel avatar Sep 15 '23 21:09 eddelbuettel

By the way, the build succeeds on all macOS versions (11, 12, 13) on arm64, but only succeeds on macOS 13 on x86_64.

ryandesign avatar Sep 15 '23 22:09 ryandesign

When arm64 was added I think as minimum compile version was imposed.

I will close (and possibly lock) this now. I am not part of the r-sig-mac mailing list and I cannot help you any further: our repository contains as R package which builds on the supported macOS builds. For the rest (including ports to platforms we do not have) you are on your own. We will try to help, I do not feel like this discusssion has helped anyone (besides you informing about a type I shall fix, so thank you).

eddelbuettel avatar Sep 15 '23 22:09 eddelbuettel

As I said, at MacPorts we encourage our contributors to report problems to the developers so that they can be fixed at the source. By closing such bug reports without addressing them, you are sending the message that you do not want bug reports and you disincentivize your users and partners from reporting any future problems—even those that might affect your supported platforms, because how would we know the scope of the problem? As a result, the quality of your software will suffer.

ryandesign avatar Sep 15 '23 22:09 ryandesign

Thanks. We heard you the first time. We wish you well.

eddelbuettel avatar Sep 15 '23 22:09 eddelbuettel

@barracuda156 @ryandesign thanks for opening the issue. Circling back here, we're happy to keep this open as a line of communication while you debug.

The only thing that comes to mind immediately as a potential cause for illegal instruction error is the AVX2 routines in libtiledb; those are supposed to be excluded by this cmake check on CPUs which don't support AVX2 -- but I don't think we've ever confirmed that to work correctly except on Linux.

Assuming we are not enabling the AVX2 code, and libtiledb + tiledb-r are both compiled completely from source, then I would start to consider other possibilities. It would be very helpful to know exactly what CPU architecture you are running on, and whether all build+tests processes in macports are guaranteed to occur on the same build node.

ihnorton avatar Sep 16 '23 04:09 ihnorton

Note that the configure step for tiledb-r does not run cmake so unless the user specified a pre-made library no cmake detection is done as the build defaults to a fallback of using premade artifacts.

As detailed in the installation options vignette if and when a library has been built locally, potentially better reflecting local platform settings, then R CMD INSTALL --configure-args='--with-tiledb=/some/path' tiledb_*.tar.gz can use it. For now, all we know is that the standard build step for macOS does not work for the macports setup.

eddelbuettel avatar Sep 16 '23 04:09 eddelbuettel

@ihnorton @eddelbuettel Just to be clear:

  1. The issue does not happen across all Macports setup, but is specific to some systems (though mostly all Intel builds fail on buildbots, macOS 13 one is fine, as well as all arm64 and, FWIW, powerpc builds). macOS 13 should have identical builds of both TileDB itself and its R package with macOS 12 and others down to 10.13, I think. We do not consider earlier OS here (due to an unresolved so far filesystem issue TileDB won’t build on earlier ones with Clang, and unfortunately Macports won’t allow me to switch builds to GCC, unless it is PowerPC).
  2. We do use settings to allow building from source correctly, linking to libraries in Macports prefix, and also add fixes for bugs which we are able to fix. That is, a build of something may not run identically to what was presumed to be a macOS config by developers. (We hope that in most cases our setup is objectively better than OS+Xcode one, since we are not limited by Apple poor support for Xcode and can use the latest compilers and runtime, though for Intel that should be still Clang and libc++. Subsequently, in most cases artificially imposed restriction for macOS version target is unneeded and should be dropped.) Point 1, however, implies that whatever Macports setup is, it is not cause of the failure here, or at least not the only cause, otherwise it should have failed uniformly for all Intel systems or for systems using the same compiler, which is not the case.

P. S. I will look into what happens with Intel-specific build choices, but the error looks unfamiliar to me. I believe, I never saw it happening with R ports, and we build quite a number of them.

barracuda156 avatar Sep 16 '23 08:09 barracuda156

We have seen this on aging CPUs: the illegal instruction is eg what happens when you hit AVX2 code a cpu too old to support. I have also seen it with package arrow on an ancient Xeon I use for bulk tests.

eddelbuettel avatar Sep 16 '23 11:09 eddelbuettel

@ryandesign Ryan, could you please check locally if it builds for you on any recent macOS (x86_64) which is convenient to test? Just to make sure it is not only our buildbots which fail.

barracuda156 avatar Sep 16 '23 16:09 barracuda156

Also FWIW before downloading a convenience pre-made library for users on Linux we check for avx2 availability. On the more controlled macOS side that was never an issue / is not an issue for CRAN from where our users get their macOS builds:

https://github.com/TileDB-Inc/TileDB-R/blob/3c675c6135d35388df3bab9c804f46956637e3aa/tools/fetchTileDBLib.R#L33-L43

If your hardware deviates, your best bet (and documented option) is to compile TileDB Embedded locally for use by the TileDB R package.

eddelbuettel avatar Sep 16 '23 16:09 eddelbuettel

to compile TileDB Embedded locally for use by the TileDB R package.

@eddelbuettel This is precisely what is being done. tiledb is built as a separate port, then R-tiledb links to it: https://github.com/macports/macports-ports/blob/a8e9fe090cfec43386796917595a81e31a6802b5/R/R-tiledb/files/patch-unbreak-build.diff#L24-L25

barracuda156 avatar Sep 16 '23 16:09 barracuda156

I am sure you will be able to devise a workaround given that you can reproduce it. We cannot.

(Also unclear from your diff. The same logic applies also for downloaded blobs. But life is too short for arguing here so I will no longer engage. I gave you the help I can give you.)

eddelbuettel avatar Sep 16 '23 16:09 eddelbuettel

I am sure you will be able to devise a workaround given that you can reproduce it. We cannot.

There is no implication that anyone should, but in a case someone wishes to, it is trivial to reproduce Macports setup, provided hardware is available: for x86_64 pre-built ports are there, so sudo port -v install R-tiledb will get all through from scratch – either to a successful build (if the error was due to old cpus of buildbots) or to a failure of R-tiledb build, if the cause is something else. Everything prior to it will be downloaded pre-built (by default). Should take a few minutes from installation of Macports.

barracuda156 avatar Sep 16 '23 17:09 barracuda156

Thanks for the information, @barracuda156.

ihnorton avatar Sep 16 '23 17:09 ihnorton

@ihnorton Perhaps worth adding: if Macports is just installed, run sudo port sync first to ensure the latest versions of ports are used.

barracuda156 avatar Sep 16 '23 17:09 barracuda156