nixpkgs icon indicating copy to clipboard operation
nixpkgs copied to clipboard

rPackages.arrow: fix build v14.0.0.2

Open philipp-baumann opened this issue 1 year ago • 9 comments

Description of changes

-> a clean PR after https://github.com/NixOS/nixpkgs/pull/289274 has been messed up after failing squashing and force-pushing, sorry for this.

Current {arrow} on CRAN is now at 14.0.2.1, but the version of current rPackages.arrrow based on CRAN archive is 14.0.0.2 (hence basing on that): https://cran.r-project.org/web/packages/arrow/index.html . arrow-cpp (libarrow) in Nixpkgs master is currently at 15.0.0, which means that the versions are not compatible when compiling rPackages.arrow. arrow-cpp is a compile- and run-time dependency of rPackages.arrow. Currently, the release cycle is out-of-sync, more details here: https://github.com/apache/arrow/issues/39698 . On the long term, such a mismatch of library versions is likely, and also it seems more likely that the arrow-cpp is more frequently updated on Nixpkgs compared to CRAN updates/adjustments/fixes in rPackages. Hopefully, we don't need to override/patch this this too often in the near future.

Things done

Adjusted buildInputs and nativeBuildInputs for failing build of rPackages.arrow.

  • Built on platform(s)
    • [x] x86_64-linux
    • [ ] aarch64-linux
    • [ ] x86_64-darwin
    • [x] aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • [ ] sandbox = relaxed
    • [ ] sandbox = true
  • [ ] Tested, as applicable:
  • [x] Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • [ ] Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • [ ] (Package updates) Added a release notes entry if the change is major or breaking
    • [ ] (Module updates) Added a release notes entry if the change is significant
    • [ ] (Module addition) Added a release notes entry if adding a new NixOS module
  • [ ] Fits CONTRIBUTING.md.

Add a :+1: reaction to pull requests you find important.

philipp-baumann avatar Feb 28 '24 20:02 philipp-baumann

Some notes on the fixes for darwin:

  • Google cloud storage support: https://arrow.apache.org/docs/developers/cpp/building.html: is provided via abseil-cpp; however this is not yet supported for darwin: => https://github.com/apache/arrow/blob/c6f20a2348b3336f2ceff5b245e5eb10db7f8ce3/cpp/CMakeLists.txt#L705 , but also google cloud storage support is desactivated here for darwin in arrow-cpp: https://github.com/NixOS/nixpkgs/blob/c8e74c2f83fe12b4e5a8bd1abbc090575b0f7611/pkgs/development/libraries/arrow-cpp/default.nix#L43
  • On darwin, we specifically need https://launchpad.net/intltool/ (found via error ld: library not found for -lintl in here https://github.com/pothosware/SoapySDR/issues/293 )

philipp-baumann avatar Feb 28 '24 23:02 philipp-baumann

  • rPackages.arrow Compiles and is functional on linux-x86_64:
> arrow::arrow_info()       
Arrow package version: 14.0.0.2

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs        TRUE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc   TRUE

Memory:
                  
Allocator jemalloc
Current    0 bytes
Max        0 bytes

Runtime:
                          
SIMD Level          avx512
Detected SIMD Level avx512

Build:
                           
C++ Library Version  14.0.0
C++ Compiler            GNU
C++ Compiler Version 13.2.0

> sessionInfo() 
timedatectl: /lib64/libc.so.6: version `GLIBC_2.35' not found (required by /nix/store/dkw193dr7fy5z05y6qllpigjfcr460v6-gfortran-13.2.0-lib/lib/libgcc_s.so.1)
timedatectl: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /nix/store/dkw193dr7fy5z05y6qllpigjfcr460v6-gfortran-13.2.0-lib/lib/libgcc_s.so.1)
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.7 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /nix/store/syr47j2liwflcp68q32l4r87cv4gi7yp-blas-3/lib/libblas.so.3;  LAPACK version 3.12.0

locale:
[1] C

time zone: Europe/Zurich
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0 bit_4.0.5        compiler_4.3.2   magrittr_2.0.3  
 [5] assertthat_0.2.1 R6_2.5.1         cli_3.6.2        glue_1.6.2      
 [9] bit64_4.0.5      vctrs_0.6.5      lifecycle_1.0.4  arrow_14.0.0.2  
[13] rlang_1.1.2      purrr_1.0.2     
Warning message:
In system("timedatectl", intern = TRUE) :
  running command 'timedatectl' had status 1 and error message 'Function not implemented'
  • rPackages.arrow Compiles and is functional on darwin-aarch64:
> arrow::arrow_info()
Arrow package version: 14.0.0.2

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc  FALSE
mimalloc   TRUE

Memory:
                  
Allocator mimalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                           
C++ Library Version  14.0.0
C++ Compiler          Clang
C++ Compiler Version 16.0.6

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin23.3.0 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /nix/store/yr2iy5lgskn97n7dval3q1f0svn8vjxy-blas-3/lib/libblas.dylib 
LAPACK: /nix/store/hngg0chmy7jkhncdx57js5v9vx2ppby0-openblas-0.3.26/lib/libopenblasp-r0.3.26.dylib;  LAPACK version 3.12.0

locale:
[1] C/UTF-8/C/C/C/C

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] arrow_14.0.0.2

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0 bit_4.0.5        compiler_4.3.2   magrittr_2.0.3  
 [5] assertthat_0.2.1 R6_2.5.1         cli_3.6.2        glue_1.6.2      
 [9] bit64_4.0.5      vctrs_0.6.5      lifecycle_1.0.4  rlang_1.1.2     
[13] purrr_1.0.2

philipp-baumann avatar Feb 28 '24 23:02 philipp-baumann

For SIMD support via xsimd on darwin-aarch64, seems that things need to be tweaked for darwin there https://github.com/NixOS/nixpkgs/blob/c8e74c2f83fe12b4e5a8bd1abbc090575b0f7611/pkgs/development/libraries/xsimd/default.nix#L18C1-L21C47

philipp-baumann avatar Feb 29 '24 00:02 philipp-baumann

nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
$ git -c fetch.prune=false fetch --no-tags --force https://github.com/NixOS/nixpkgs master:refs/nixpkgs-review/0
remote: Enumerating objects: 276, done.
remote: Counting objects: 100% (235/235), done.
remote: Compressing objects: 100% (99/99), done.
remote: Total 276 (delta 151), reused 178 (delta 127), pack-reused 41
Receiving objects: 100% (276/276), 413.53 KiB | 8.11 MiB/s, done.
Resolving deltas: 100% (158/158), completed with 76 local objects.
From https://github.com/NixOS/nixpkgs
   3378e4ec1694..5bcb0e67ea41  master     -> refs/nixpkgs-review/0
$ git worktree add /home/spectral-cockpit/.cache/nixpkgs-review/rev-30918fcb11f4e9d099a0a3af1a34c13fe76a1ebe/nixpkgs 5bcb0e67ea412db217d707097cef69a41ee7782d
Preparing worktree (detached HEAD 5bcb0e67ea41)
Updating files: 100% (40186/40186), done.
HEAD is now at 5bcb0e67ea41 Merge pull request #292040 from r-ryantm/auto-update/python311Packages.unstructured
$ nix-env --extra-experimental-features no-url-literals --option system x86_64-linux -f /home/spectral-cockpit/.cache/nixpkgs-review/rev-30918fcb11f4e9d099a0a3af1a34c13fe76a1ebe/nixpkgs -qaP --xml --out-path --show-trace --no-allow-import-from-derivation
$ git merge --no-commit --no-ff 30918fcb11f4e9d099a0a3af1a34c13fe76a1ebe
Automatic merge went well; stopped before committing as requested
$ nix-env --extra-experimental-features no-url-literals --option system x86_64-linux -f /home/spectral-cockpit/.cache/nixpkgs-review/rev-30918fcb11f4e9d099a0a3af1a34c13fe76a1ebe/nixpkgs -qaP --xml --out-path --show-trace --no-allow-import-from-derivation --meta
Nothing to be built.
$ /nix/store/ywlpcc5zikgjhgcv09avzyz7p1sd0f3b-nix-2.15.1/bin/nix-shell /home/spectral-cockpit/.cache/nixpkgs-review/rev-30918fcb11f4e9d099a0a3af1a34c13fe76a1ebe/shell.nix

philipp-baumann avatar Feb 29 '24 00:02 philipp-baumann

@Kupac fyi tried to split overrides up in packagesWithNativeBuildInputs and the arrow-cpp otherOverrides = old: new: code in otherOverrides section. didn't work, so i'm keeping it how it is.

philipp-baumann avatar Feb 29 '24 01:02 philipp-baumann

@Kupac fyi tried to split overrides up in packagesWithNativeBuildInputs and the arrow-cpp otherOverrides = old: new: code in otherOverrides section. didn't work, so i'm keeping it how it is.

i tried it here, and it worked: https://github.com/Kupac/nixpkgs/commit/afcc24e5032c8bffc7d27e985a4f5efa8b135e4c but I don't know if it still works on darwin. Feel free to pull it in

Kupac avatar Feb 29 '24 22:02 Kupac

We could match the version of the two arrows like this, without having to write it out by hand: https://github.com/Kupac/nixpkgs/commit/0603a7a014ccbb4c44226ecdf8fa31cd274ba322 We'd still have to update the hash when the build fails, though.

Kupac avatar Feb 29 '24 22:02 Kupac

We could match the version of the two arrows like this, without having to write it out by hand: Kupac@0603a7a We'd still have to update the hash when the build fails, though.

Thanks @Kupac for your suggestion, had to focus on other stuff, so quite late to the party.

philipp-baumann avatar Mar 05 '24 08:03 philipp-baumann

No worries, we can't FOSS all the time :) What's important is that we make progress, because arrow is a central package in R, and many others depend on it. So hopefully we can get it right ;-)

Kupac avatar Mar 05 '24 20:03 Kupac

@Kupac fyi tried to split overrides up in packagesWithNativeBuildInputs and the arrow-cpp otherOverrides = old: new: code in otherOverrides section. didn't work, so i'm keeping it how it is.

i tried it here, and it worked: Kupac@afcc24e but I don't know if it still works on darwin. Feel free to pull it in

great, it still builds on Darwin. Like the approach with the version you suggested.

philipp-baumann avatar Mar 06 '24 22:03 philipp-baumann

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-already-reviewed/2617/1512

nixos-discourse avatar Mar 07 '24 09:03 nixos-discourse

lesson learnt, never merge again for nixpkgs PRs and always rebase, now addressed https://github.com/NixOS/nixpkgs/pull/294933

philipp-baumann avatar Mar 11 '24 09:03 philipp-baumann